Slashdot Mirror


Open Source Moving in on the Data Storage World

pararox writes "The data storage and backup world is one of stagnant technologies and cronyism. A neat little open source project, called Cleversafe, is trying to dispell of that notion. Using the information dispersal algorithm originally conceived of by Michael Rabin (of RSA fame), the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data. The software is also very scalable, allowing you to run your own backup grid on a single desktop or across thousands of machines."

169 comments

  1. I don't think you know what that word means. . . . by Ohreally_factor · · Score: 1, Interesting

    The data storage and backup world is one of stagnant technologies and cronyism.

    --
    It's not offtopic, dumbass. It's orthogonal.
  2. Editors, please note! by Anonymous Coward · · Score: 5, Informative

    Editors please note!

    Editors, please note that there is some incorrect information in this post. Firstly, the original concept of the IDA was designed by Shamir of RSA fame, not Rabin.

    Also note that the Cleversafe IDA is a custom algorithm, and is only similar to Shamir's initial concept.

    1. Re:Editors, please note! by dejamatt · · Score: 0, Redundant

      Also: the R in RSA is Ron Rivest, not Michael Rabin.

    2. Re:Editors, please note! by Jake73 · · Score: 3, Interesting

      Really? This is just error correction. Reed-Solomon error correction, and even the Chinese Remainder Theorem can be applied to reconstruct data when some has been intentionally or unintentionally punctured.

    3. Re:Editors, please note! by cpeikert · · Score: 1

      You are right; however, Shamir first observed that Reed-Solomon error correction also has nice secrecy properties. That is, even if you have one share less than the number required to reconstruct, you actually have no information about the secret at all. This can be a good thing if you are distributing your data among potentially untrustworthy servers.

  3. Backup for Backuper? by foundme · · Score: 3, Interesting

    I can't find this in the FAQ -- is there a "creator/seeder" in the whole process? Which means a particular group of slices can only be unlocked by a particular seeder created by Turbo IDA.

    If there is a creator/seeder, then we are still burdened by having to keep this seeder safe so that we can retrieve the distributed slices.

    If there is no creator/seeder, is this safe enough so that people cannot patch slices together by way of trial-and-error?

    --
    Please stop entering code 2,2,7,6,6,4
  4. Looking at it here for work by gasmonso · · Score: 1

    At work we're looking into this to store critical data on out intranet which spans several states and facilites. Looks great, but only time will tell.

    I seem to remember a project months ago that was going to use P2P to backup your data on other P2P users computers which to me sounds quite insane. Anyone know if this is related?

    http://religiousfreaks.com/
    1. Re:Looking at it here for work by TheJediGeek · · Score: 1
      If you're talking about using a public P2P, then it's insane.
      If it could be set up properly to be used on a large corporate intranet, then there's some merit to it. If you could use this system to spread chunks of data out over an intranet that spans several states, then it could be a useful way to store critical data during hurricane season or the like. If a building took sufficient damage from weather, earthquake, terrorist, broken water main, etc. so that the data center in that building was a loss, the company could theoretically reconstruct the data from chunks on other company computers across the country.

      Now I'm just rambling... almost time to go home.

    2. Re:Looking at it here for work by Adult+film+producer · · Score: 1

      I think Freenet performs something like what is described in this article, but I'm hardly a crypto expert. I do know that freenet slices up data that is inserted into small chunks (I believe it's 32k chunks with the newest darknet.) There is also healing chunks too... the only disadvantage with backing up data on freenet is that data/information that is rarely accessed falls off the network as newer information replaces it.

  5. The 'R' stands for Rivest, not Rabin by Durindana · · Score: 5, Informative



    While Michael Rabin was inventor of the Rabin cryptosystem in 1979, it was Ronald Rivest, Adi Shamir and Len Adleman behind RSA two years earlier.

    1. Re:The 'R' stands for Rivest, not Rabin by dan+dan+the+dna+man · · Score: 2, Funny

      Nonsense, everyone know it was Rowland Rivron..

      --
      I don't read your sig, why do you read mine?
    2. Re:The 'R' stands for Rivest, not Rabin by Pseudonym · · Score: 1

      What about Wodewick, then?

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    3. Re:The 'R' stands for Rivest, not Rabin by Anonymous Coward · · Score: 0

      No, that would be the RRRRSA algorithm.

  6. Re:I don't think you know what that word means. . by flanksteak · · Score: 4, Funny

    Speak for yourself. I have all my old business buddies back up my data for me.

  7. Think RAID5, only way better by El+Cubano · · Score: 4, Interesting

    Using the information dispersal algorithm originally conceived of by Michael Rabin (of RSA fame), the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data.

    It seems like this can be tuned to provide varying levels of fault tolerance. According to the abstract (I don't have an ACM web account, and I couldn't find the full text), it seems like I can take a file and make it so that any four chunks can be used to rebuild the file. I can then take those chunks and distribute them eight times to different machines. Thus, five of the eight machines would have to be rendered inoperable before I were unable to retrieve my data.

    If I understand it correctly, then this is really slick.

    1. Re:Think RAID5, only way better by Anonymous Coward · · Score: 1, Informative

      Meh, it sounds like it's just par2 integrated into a distributed filesystem.

    2. Re:Think RAID5, only way better by Anonymous+Crowhead · · Score: 0, Offtopic

      If I understand it correctly, then this is really slick.

      s/really slick/complete overkill/

    3. Re:Think RAID5, only way better by dracken · · Score: 4, Interesting

      Rabin's algorithm relies on a nifty trick. If you take a k dimensional vector and store the dot product with k orthogonal vectors then the vector can be reconstructed using just the dot product. This is a fancy way of saying any point on the x-y plane can be located if you have the x-coordinate and y-coordinate. However, if you take a k dimensional vector and compute the dot product with l mutually orthogonal vectors (where l > k), then any k dot products are enough to reconstruct the original vector.

      Rabin has shown how to come up with l vectors of which k are mutually orthogonal.

    4. Re:Think RAID5, only way better by volve · · Score: 2, Funny

      Pardon?!

      I think I've suddenly gone blind because your "[non-]fancy way of saying" doesn't sound a damn thing like the gibberish my eyes just read. "mutually orthagonal vectors" ?!

      If I'm wrong, then I should probably go and lie down, but I just showed my wife and now she's crying... so I think it's your explanation and not me.

      *goes to find advil*

    5. Re:Think RAID5, only way better by martin-boundary · · Score: 1
      That can't be right, can it?
      However, if you take a k dimensional vector and compute the dot product with l mutually orthogonal vectors (where l > k), then any k dot products are enough to reconstruct the original vector.
      Consider three dimensional space (l = 3, k = 2). Let the k dimensional vector be v = (1/2, 1/3) in the x-y plane. Then the l dot products are v.i = 1/2, v.j = 1/3, v.k = 0. I cannot pick any two products (say 1/2 and 0) to reconstruct v.

      Something must be lost in translation from the ACM, but I can't login myself. Anybody know?

    6. Re:Think RAID5, only way better by cvalente · · Score: 1

      Just to make things clear.

      On a k dimensional vector space you can't come up with l>k (non null) mutually orthogonal vectors. After all k non null mutually orthogonal vector will form a basis for the vector space.

      --
      https://www.accountkiller.com/removal-requested
    7. Re:Think RAID5, only way better by siwelwerd · · Score: 2, Informative

      I think you mean linealy independent, not mutually orthogonal. Infact, the word orthogonal isn't even in Rabin's paper. Thus, what Rabin has done is shown how to generate n vectors such that any m are linearly independent .

    8. Re:Think RAID5, only way better by cvalente · · Score: 1

      A k dimensional vector space doesn't have l>k linearly independent vectors.

      I suspect the original poster expressed himself incompletely because this is nonsense.

      The only way this can make any sense is if the vector belongs to a k dimensional vector *subspace* of another vector space of at least dimension l>k.

      In that scenario the subspace can't be orthogonal any of the l mutually orthogonal vectors for things to work as described.

      This needs further clarification.

      --
      https://www.accountkiller.com/removal-requested
    9. Re:Think RAID5, only way better by siwelwerd · · Score: 4, Informative

      It's an l dimensional space though. The PDF of the paper is http://portal.acm.org/ft_gateway.cfm?id=62050&type =pdf&coll=GUIDE&dl=GUIDE&CFID=70220506&CFTOKEN=528 80553, and is accessible to anyone who's had an undergraduate course in linear algebra. The crux of the argument is on page 4.

    10. Re:Think RAID5, only way better by cvalente · · Score: 1

      Thank you.
      It's a shame the original poster never tried to clarify any of this.

      --
      https://www.accountkiller.com/removal-requested
    11. Re:Think RAID5, only way better by Anonymous Coward · · Score: 0

      Oh yeah? I can take a file, store it on 8 different machines, and recover it even if 7 of the machines fail. Beat that!

    12. Re:Think RAID5, only way better by pipingguy · · Score: 1


      "mutually orthagonal vectors" simply means that two separate things are going in the X-Y plane, which is good. If one of them might be travelling in the Z plane, it might have poked you in the eye for reading it. That would be bad.

    13. Re:Think RAID5, only way better by siwelwerd · · Score: 1

      What's a shame is his erroneous post is sitting at +4 insightful, and the posts with good math in them aren't modded up.

    14. Re:Think RAID5, only way better by Starker_Kull · · Score: 1

      Well, this is /. - Mathematics only to the degree necessary, but rarely appreciated. Nice to know that I wasn't the only one who wondered how you fit l > k mutually orthogonal (or even linearly indendent) vectors into a k-dimensonal space, esp. as many formulations of linear algebra use the maximum number of linearly independent vectors as the DEFINITION of the dimension of that vector space and go from there to define coordinates of vectors.... *sigh* ... I need to find a good, active math site for interested amateurs someday. Any suggestions?

    15. Re:Think RAID5, only way better by Kadin2048 · · Score: 1

      My interpretation was that the vector space wasn't explicitly defined, but was assumed to be larger than either k or l.

      So basically, n-dimensional vector space where n > l > k.

      That was my assumption; although I think you can have n => l > k and it still works (of course when l is n-dimensional and orthogonal, then it's a basis).

      As you pointed out, it doesn't make sense for k to be n-dimensional, because then you can't have l>k.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
    16. Re:Think RAID5, only way better by ottffssent · · Score: 1

      Yes. This is a specific application of secret-sharing algorithms.

      In the classic formulation, the secret is split into N parts, such that no part reveals any information about the secret (that is, knowing one of the parts does not make any possible secret more likely than any other possible secret). The really cool thing is that you can decide that's not good enough, and can split up your secret such that knowing M or fewer parts reveals no information about the secret (for sufficiently large N). Normally you could use a mechanism such as this to secure something like nuclear launch codes: no two people (captian and first mate?) can conspire to reveal the secret, but if three or more conspire (== agree that nuclear launch is warranted), they can combine their information and recreate the secret launch codes.

      The data storage application reverses the original purpose of secret sharing. Rather than trying to resist conspiracy (keeping M large), distributed backup usage is trying to reduce the effects of data loss (keeping N-M comfortably large. RAID5, in a sense, has N-M=1; RAID6 has N-M=2, etc.) by permitting "conspiracy" of a subset of the original parts to restore the original information. Essentially, this is a somewhat more computationally intensive way of dividing up data, but has the great advantage that it can easily adapt to different desired amounts of redundancy and different numbers of physical data storage targets.

      Note: this is based on some math I had in college, not a reading of the ACM article.

    17. Re:Think RAID5, only way better by cvalente · · Score: 1

      http://planetmath.org/ should do the trick.

      "many formulations of linear algebra use the maximum number of linearly independent vectors as the DEFINITION of the dimension of that vector space"

      http://planetmath.org/encyclopedia/Dimension2.html

      --
      https://www.accountkiller.com/removal-requested
    18. Re:Think RAID5, only way better by Anonymous Coward · · Score: 0

      Three dimensional space would have k=3 (and l as big as you want, depending on how much redundancy you want).

      Go back to the two dimensional example (k=2). If you think of k as representing the point (X,Y), then the l vectors would represent (different) straight lines all going through the point (X,Y) on the plane. Any 2 of them are enough to reconstruct the original k.

      In three dimensions the k vector is the point (X,Y,Z). The l vectors all represent (different) planes that all intersect at (X,Y,Z). Any three of the are enough to reconstruct the original k.

  8. stagnant?? by Phredward · · Score: 4, Insightful

    Companies are crying out for new storage solutions all the time. If the answer is slow in coming it is not due to "cronyism" and "stangnation". Rather the causes include the facts that distributed storage is hard, and people don't like loosing their data.

    1. Re:stagnant?? by Anonymous Coward · · Score: 1, Funny

      "people don't like loosing their data."

      Wouldn't distributed storage be loosing data? After all, it's being set loose from one device, to be stored upon many...

    2. Re:stagnant?? by steelshadow · · Score: 0
      Rather the causes include the facts that distributed storage is hard, and people don't like loosing their data.
      Can I be the first to comment on the old "lose/loose" thing? Of course, in this case "loosing" the data may be appropriate as it gets set loose on a bunch of machines...
    3. Re:stagnant?? by Anonymous Coward · · Score: 0

      Explain to me why, despite the fact that harddrives and optical media are the new backup hotness, the vast majority of the backup softwares out there are stuck with the "tape paradigm"?

      I'm not talking about not being able to use a harddrive or a DVD drive as a backup target, many can do that just fine, I'm talking about the ones that want me to make a "tape label" for a harddrive, and then tells me I need to switch tapes for my next full backup, even though the drive has 200GB free still. And then it overwrites my last full backup because my "tape" was "rewound"!

      The ones that still maintain a separate catalog (and cant recover anything from backups if the catalog itself is lost!) rather than storing index information in the backup, because writing index information to a backup is hard when it's on a serial media like tape. Or the ones that even WITH a catalog index, read all the bytes from a file starting from byte 0 every time, rather than using a damn fseek() to skip ahead, making recovery of a single file take an eternity if its at the end of a 50GB backup file. Random Access, motherfucker, do you speak it?!

      I'm sure "cronyism" was the wrong word for the submitter to use, but backup software has been stagnant for so long that the mold growing on it has long since become a spacefaring sentinent race. Who is still trying to recover their world domination plans from a 5 million exabyte backup dump, be very afraid in 2.8e7 years.

    4. Re:stagnant?? by RobertLTux · · Score: 1

      what i would like in a backup program to
      1 create say 500 meg chunks (compressed)
      2 write a base system + itself to disk
      3 build an iso with 8 chunks (or 16 if target is a DL disc)
      4 write out the disc
      5 loop until disk has been backed up

      (so what the state of backup to stone tablets???)

      --
      Any person using FTFY or editing my postings agrees to a US$50.00 charge
    5. Re:stagnant?? by klenwell · · Score: 1

      "Innovation makes enemies of all those who prospered under the old regime, and only lukewarm support is forthcoming from those who would prosper under the new." -- Machiavelli

      Open source innovation makes even stronger enemies among the old regime. And, as often pointed out, most managers tend to prefer the status quo.

      --
      Innovation makes enemies of all those who prospered under the old regime... -- Machiavelli
    6. Re:stagnant?? by Anonymous Coward · · Score: 0

      Can I be the first to comment on the old "lose/loose" thing?

      No.

    7. Re:stagnant?? by Slarty · · Score: 1

      Ah, where is LoseNotLooseGuy when you need him? Haven't seen that dude around in a long time... that saddens me. So much for the cause.

      --
      Hi... I'm Larry... the shivering chipmunk... brrrrr!... I'm cold... I need a sweater...
    8. Re:stagnant?? by kfg · · Score: 1

      Rather the causes include the facts that distributed storage is hard, and people don't like loosing their data.

      Oh suuuuuuuuuure! That's what "they" want/i you to believe.

      But the reality is that Norton bought up all the genetically engineered data storage pigeons and is keeping them in bondage in a secret aviary in Piscataway, NJ, colloquially refered to as Area "Wow! That's a lot of pigeon shit."

      KFG

    9. Re:stagnant?? by kabz · · Score: 1

      As far as old data falling off the system goes...

      The MP3s of the many, outweigh the MP3s of the few.

      --
      -- "It's not stalking if you're married!" My Wife.
    10. Re:stagnant?? by Anonymous Coward · · Score: 0

      This sounds like the typical PC or small server tape backup. Large scale tape systems are very different.
      For example, the company I worked for was heavily into tape. These were attached via SAN to a number of systems - various mainframes, unix + windows. There were a few large robotic libraries, each holding c. 6000 carts and for the unix+windows boxes each cart held around 40Gb. The total storage was much larger, as full tapes were ejected and expired tapes brought back in.
      Some of the larger unix hosts could perform online database backups to 6 simultaneous drives at c. 25Mb/sec. per drive.
      The time to mount (to one of the 40 drives) and seek to any file was under 30 seconds.
      The catalogues were kept both on disk and tape. If the disk catalogues were corrupt, tapes could be scanned to rebuild them. Real tape systems can seek without reading.
      The offline storage and capability to recover to any transction within the last month or so, is probably why some still use tape.

    11. Re:stagnant?? by jabuzz · · Score: 1

      Because what other system lets me store 800GB native on a 125 USD removable device. Let's see there is none. So if I want secure offline backup of a significant amount of data my cheapest option is a tape library. Consequently the vast majority of backup software is targeted at tapes as they offer the best option for backup.

    12. Re:stagnant?? by poot_rootbeer · · Score: 1

      people don't like loosing their data.

      One method of reducing risk is to place redundant vowels in some of your words. In case the first one gets loost somehow, you still have the second one.

  9. oh yea by dingDaShan · · Score: 1

    Since all we need is a majority of files, its a realtime compression scheme of 51%. ------ Thats what I would do. You do whatever you want.

    1. Re:oh yea by dgatwood · · Score: 1
      I would expect at least some expansion so that 50% of the encoded data is substantially greater than the size of 50% of the original data. Thus, it probably is a net expansion rather than compression.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    2. Re:oh yea by Anonymous Coward · · Score: 0

      Since all we need is a majority of files, its a realtime compression scheme of 51%. ------ Thats what I would do. You do whatever you want.

      Are you joking or just slow on the uptake? Even if you couldn't be bothered to consider how this works, it should be obvious that data that have already been compressed using a decent algorithm cannot be compressed by another 50%.

  10. Personal backup grid... by __aaclcg7560 · · Score: 0, Flamebait

    The software is also very scalable, allowing you to run your own backup grid on a single desktop or across thousands of machines.

    Not good enough for the website to avoid being slashdotted. Maybe the technology is still Beta?

    1. Re:Personal backup grid... by Anonymous Coward · · Score: 0

      Well, it's backup software, not webserving software, firstly. Also, the software is still in alpha, though with a development team of 10 or so ( as per their sourceforge page: http://sourceforge.net/projects/cleversafe ), that's soon to change.

  11. Rar + Par + BitTorrent? by DigitalRaptor · · Score: 4, Interesting

    This sounds like Rar, Par, and BitTorrent got merged in some freak transporter accident...

    Par files (for use with QuickPar, etc) are great, saving all sorts of extra posting on binary newsgroups.

    --
    Lose Weight and Feel Great with Isagenix
    1. Re:Rar + Par + BitTorrent? by Ohreally_factor · · Score: 3, Funny

      I'm trying to imagine RAR with a PAR head and BitTorrent wings.

      --
      It's not offtopic, dumbass. It's orthogonal.
    2. Re:Rar + Par + BitTorrent? by volve · · Score: 1

      ...why?!

      Here, have some of my advil - it sounds like you may need them more than I.

    3. Re:Rar + Par + BitTorrent? by Ohreally_factor · · Score: 1

      Hint: It's voice would sound just like Jeff Goldblum.

      --
      It's not offtopic, dumbass. It's orthogonal.
    4. Re:Rar + Par + BitTorrent? by Thing+1 · · Score: 1

      BrundlePAR?

      --
      I feel fantastic, and I'm still alive.
    5. Re:Rar + Par + BitTorrent? by Anonymous Coward · · Score: 0

      That should be "theoretically saving all sorts of extra posting on binary newsgroups." Most people misuse pars and it increases the load on newsgroups, reducing retention, causing lost pieces, and generally making things worse.
        The right way to use it is to make your pars, but NOT POST THEM until you've got some people requesting fills. If person A needs parts 33, 35, and 41, while person B needs 27 and 28, you post pars to do up to three fills, and you save the extra bandwidth it used to take to post 5 non-par fills.
          Instead, what every bozo does is to make his pars, post the rars, and then post all of the pars too. Rather than reducing bandwidth, this increases it by whatever % of pars he generated, typically 10%. Multiplied every binary post on usenet by 110%, and pars have made the load far worse, not better.
        It's not PAR's fault, of course, it's the dummies misusing it.

    6. Re:Rar + Par + BitTorrent? by CCFreak2K · · Score: 1

      Sounds like a new archive type.

      DAR: Distributed Archive.

      --
      "Beware of he who would deny you access to information, for in his heart he dreams himself your master."
    7. Re:Rar + Par + BitTorrent? by advocate_one · · Score: 1

      hmmm, can you imagine Bittorrent with this tech so that you only have to have downloaded at least 50% of the total file to be able to reconstruct the missing bits... wow... that would be some speed increase, if it didn't get slowed down by having all the extra error correction data required...

      --
      Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
    8. Re:Rar + Par + BitTorrent? by colmore · · Score: 1

      Actually... this is a pretty fantastic idea for distributed filesharing.

      It's much easier to get 50% of a 200% larger file than to get 100% of the original. You'd never need to see a complete download, thus you wouldn't get cases where your transfer gets delayed waiting for a seed for the one little piece that nobody else has.

      50% is probably overkill for bittorrent. If zip format were built that could reconstruct its contents with any arbitrary 90% of the file, that would be *amazing* for torrents.

      So er... anyone want to point me toward some readings on the basics of file compression and checksums and that kind of thing?

      --
      In Capitalist America, bank robs you!
  12. Not a new idea by D3viL · · Score: 5, Informative

    so it's sort of like parchive http://parchive.sourceforge.net/ which is software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data

  13. Sourceforge page by Anonymous Coward · · Score: 1, Informative

    Well, their webserver seems like it's been smoked, here's a link to their sourceforge page, where you can grab the actual software:

    http://it.slashdot.org/it/06/04/26/2039224.shtml

    1. Re:Sourceforge page by Spy+der+Mann · · Score: 1

      Well, their webserver seems like it's been smoked

      I really hope they have a backup handy.

  14. Re:I don't think you know what that word means. . by Anonymous Coward · · Score: 0

    It's a perfectly cromulent word.

  15. You mean Shamir, not Rabin by Anonymous Coward · · Score: 5, Interesting

    While R in RSA stands for Ron Rivest, it is Adi Shamir (S of RSA) you have in mind. He came up with a wonderful secret sharing scheme which allows a bunch of folks or computers to keep pieces of secret in such a way that no N of them have any idea what the secret is, even if they collude. OTOH N+1 of them can easily figure out the secret. RSA can help you keep important secrets safe this way: if the owner is OK, the secret cannot be recreated; if the owner quits or dies, all-important secret holders can recover his password and unencrypt critical company data. And if a couple of them cannot participate, you still can get your secret back.

    Even more amazingly Shamir's secret sharing scheme allows computing math functions, such as digital signatures, without ever recovering secret keys. This is called threshold cryptography, some of you may be interested to learn about its many wonders. Shamir rocks and so is threshold crypto!

    1. Re:You mean Shamir, not Rabin by adavies42 · · Score: 1

      For a second when I saw your post's subject, I thought we'd gotten onto Israeli politics somehow....

      --
      Media that can be recorded and distributed can be recorded and distributed.
      -kfg
  16. innovation by Ajehals · · Score: 2, Interesting
    Any innovation (if that's what this is - no doubt it will turn out to be something that someone else thought of in the 80's..) is welcomed in this area.

    Maybe one day vendors will stop pushing overly expensive and utterly bland storage solutions. i.e. Last time I had a meeting about storage the product was: 2x Servers 2x Disk Arrays with possible storage of a little under 2TB (using 24 80Gb SCSI HDDs) with RAID 5, Oh and the storage was presented as 4 @500Gb drives to the OS (Some proprietary thing). all in at a cool £27.000, (and that was before the license for CIFS) guess how it was billed - innovative... Its a joke, so the solution? In the meantime lots of SATA Drives and file replication, eventually? maybe we can make use of all that storage that sits on every machine on the LAN that is never used...

  17. Re:I don't think you know what that word means. . by n6kuy · · Score: 1

    Why bother?
    I just rely on Echelon for my data backups...

    --
    If you disagree with me on social issues, then it's pretty clear that you are a narrow-minded bigot.
  18. been done before by Splork · · Score: 4, Informative

    Related companies/projects happened in this order: MojoNation .. MNet .. HiveCache .. AllMyData

    good luck!

    1. Re:been done before by Beryllium+Sphere(tm) · · Score: 1

      Oceanstore as well.

    2. Re:been done before by Anonymous Coward · · Score: 0
      There are indeed many projects along this idea, but most provide slightly different features. For example, the Distributed Internet Backup System (DIBS) provides peer-to-peer distributed backup with automatic error correction and encryption. Some advantages of the peer-to-peer approach are that you can choose how much backup space you provide to peers in return for how much they provide to you, you have complete control of your client and all related meta-data, and you can find trading peers through an automated contract system. You can even setup your own contract server to have DIBS clients in your organization automatically find and exchange backup space only among other computers in your organization.

      In any case, I think distributed backup is a great idea whose time is finally starting to arrive. When new projects are started in this area I think we should all be happy since this gives us more choices.

  19. Virtual file server -- was a program for old Macs by dfloyd888 · · Score: 5, Interesting

    In the early 90s, a company made a virtual file server for networked Macs. Each client Macintosh had a file on its hard drive, and when a request was made through the driver, a number of Macs were contacted, and files were read and written to in a fairly load balanced fashion. I'm pretty sure it used some decent (think single DES) encryption at the time too, so someone couldn't just dig through the server's file on their Mac's hard disk and glean important data. It also added some redundancy, so if a Mac or two wasn't up on the network, it wouldn't kill the virtual Appleshare folder.

    By chance, anyone remember this technology? I have no idea what happened to it, but it would be a blockbuster open source app if done today, and was platform independant. If done right, one could create data brokerage houses, where people could buy and sell storage space, and also reliability, where space on a RAID or server array would be of higher value than space on a laptop that is rarely on the Internet.

  20. redundancy = your secret is safe (with us) by Nesetril · · Score: 1

    generally, speaking the more copies of something you have floating around, the larger the probability they get into the wrong hands. so this whole redundancy thing is just going to be viewed as a huge security breach, and never really become popular...

    --
    Jesus said to his disciples: "If you don't have a sword, sell your cloak and buy one" - Luke 22:36
    1. Re:redundancy = your secret is safe (with us) by Anonymous Coward · · Score: 0

      Actually, each slice only contains a small fraction of the original data and they are also encrypted on the client's machine before transmission to the storage sites. It's probably one of the most secure ways to store data as no one facility has all of it.

    2. Re:redundancy = your secret is safe (with us) by Ruff_ilb · · Score: 2, Insightful

      Not necessarily; if the copies you have are broken apart and split up, that doesn't mean you have a security breach.

      For example, if I tell you my 8 character password has a "q" in it, you've only lowered the number of possible passwords from 2821109907456 to 78364164096. Not exactly useful, either way.

      And of course, what good is keeping the data out of the wrong hands if the RIGHT HANDS can never get to it?

      --
      http://www.TheGamerNation.com/Forums
    3. Re:redundancy = your secret is safe (with us) by Nesetril · · Score: 1

      that's why I said that it is going to be "viewed" like that. it won't necessarily be less secure. and of course you are forgetting what any corporate/military/government official would answer to your query: "what good is keeping the data out of the wrong hands if the RIGHT HANDS can never get to it"

      --
      Jesus said to his disciples: "If you don't have a sword, sell your cloak and buy one" - Luke 22:36
    4. Re:redundancy = your secret is safe (with us) by mengland · · Score: 3, Informative

      Hello-

      I am the chief designer of the Cleversafe dispersed-storage system (aka a grid-storage software system) and am one of the project's co-founders. The Cleversafe system never stores a complete copy of the data in any one place (or "grid node" in our terminology). At most 1/11th of the file data--we call it a file "slices"--is stored at any one grid node in a "scrambled" (i.e., non-contiguous), compressed, and encrypted/signed fashion. The grid _never_ stores more than one copy of the data on the grid, and that one copy is never stored all in the same place--it's dispersed using an optimized information-dispersal algorithm that we created but has similar properties to the previously-published info-dispersal algorithms (IDAs).

      If a grid node and its associated content--i.e., the user's file slices on that node--are ever completely compromised (firewall comes down, all encryption and scrambling is cracked, etc), then the cracker acquires at most 1/11th (one-eleventh) of the data users data.

      Further, if any half (or at least 5 out of any 11) of the grid nodes are for any reason destroyed or otherwise unavailable, all of the user's data is still accessible. This is done by generating a "coded" file slice for every data slice that we store on the node, and regenerating missing file slices from down nodes by pumping the available data and coded slices through our info-dispersal algorithms (which are all open-sourced, by the way) that are executed on the client side or when the grid "self heals" for destroyed nodes.

      The system can also be implemented in a cost-effective fashion. The grid system can sustain so many concurrent, per-node outages that the availability/uptime requirements for each node are minimal. Also, the grid-node servers need not support much processing capability, for the client offloads much of the work from the servers.

      We feel this system provides a powerful combination of reliability, scalability, economy, and security.

      The hardest part of the design, imo, is to be able to reliably track all of these file slices across a large and heterogeneous set of grid-node machines housing these info-dispersed file slices. We designed the grid meta-data system from the ground up to do this and to be capacity-expandable, performance-scalable, and easily serviceable. More details for the open-source flavor of the grid-software design can be found here:
      http://wiki.cleversafe.org/Grid_Design

      There's much more that I can say about this system; I plan to add additional comments to this thread as more questions and comments arise. I'm sure there are new comments I have yet to read, for they're coming in pretty quickly...

      I also encourage further discussion at our newly-created web forums: http://forums.cleversafe.org/
      Mailing lists (that will be synchronized with the web forums) will also be available at cleverafe.org in the near future.

      -Matt

  21. not really new technology by Anonymous Coward · · Score: 0

    sounds a lot like content addressable storage.. oh wait, thats what it is... nevermind, i've already deployed that were i work...

  22. Borg Technology by JoeCommodore · · Score: 4, Funny
    When I read the statement: ...the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data. The software is also very scalable, allowing you to run your own backup grid on a single desktop or across thousands of machines.

    I was immediately visualizing a Borg Cube regenerating after a hit from the Enterprise.

    regardless, it sounds cool.

    --
    "Enjoy what you're doing! If it becomes drudgery, you're doing it wrong!" - Jim Butterfield
  23. Its great-grandchild, Google file system by wsanders · · Score: 1

    http://labs.google.com/papers/gfs.html

    Very roughly, this is what GFS does. I dn't have 25,000 servers at my disposal, so I haven't been able to test it though. Maybe next week. Meanwhile, I muddle through with tape.

    --
    Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
  24. Re:Virtual file server -- was a program for old Ma by Germo · · Score: 2, Informative

    i haven't remembered the name yet, but the company was bought by novell shortly before NDS came out. i always thought it was how NDS replicated itself around w/o eating up the network while trying to take care of itself.

  25. Link to pay-for-view contents by andrew+cooke · · Score: 3, Insightful

    The most interesting link here is behind a pay-wall. Do the editors bother to follow the link in articles? Do they just assume we all have ACM access? Come on, this place used to be a bit better than this, didn;t it?

    --
    http://www.acooke.org
    1. Re:Link to pay-for-view contents by Anonymous Coward · · Score: 0

      Oh come on. A quick search on
      http://scholar.google.com/scholar?hl=en&lr=&cluste r=15572951929910534797 Google Scholar
      gives you an alternative link.

    2. Re:Link to pay-for-view contents by Detritus · · Score: 1

      You could always look it up at a university library. Never publishing an article that cites a paper in a journal isn't a good solution.

      --
      Mea navis aericumbens anguillis abundat
    3. Re:Link to pay-for-view contents by addaon · · Score: 1

      Do you really think it's unreasonable to assume that those who are interested in ACM content will have ACM access? I mean, this isn't Bubba Joe's Intarweb Journal, it's the friggin' ACM.

      --

      I've had this sig for three days.
  26. New idea... NOT. by pedantic+bore · · Score: 4, Informative
    Why does this remind me of something? It sounds like something I've heard about already, more or less.

    I just hope they don't patent it!

    --
    Am I part of the core demographic for Swedish Fish?
  27. MOD PARENT REDUNDANT by Cal+Paterson · · Score: 4, Funny

    We all knew that.

  28. Cleversafe mirror by winkydink · · Score: 1
    --

    "I'd rather be a lightning rod than a seismometer." -Ken Kesey

  29. Re:Virtual file server -- was a program for old Ma by swillden · · Score: 1

    By chance, anyone remember this technology? I have no idea what happened to it, but it would be a blockbuster open source app if done today, and was platform independant.

    That's very interesting. If I understand what you're saying, was it something like this? That's a description I wrote up for a system I'd like to build if I every get the time.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  30. Sounds familiar. Like my master's thesis. by Saturn49 · · Score: 4, Interesting

    This can be done quite easily with Reed-Solomon coding. In fact, you don't need the majority of the nodes, but simply an arbitrary N set of nodes, with an arbitrary M nodes as redundancy. N=1 and M=1 is basically RAID1. N = n and M = 1 is simply RAID5, N=n and M=2 is RAID 6.

    In fact, I wrote a RSRaid driver for Linux for my thesis and did some performance testing on it. I'll save you the 30 pages and just tell you that the algorithm is far too CPU intensive to scale up very well for fileserver use (my original intent,) but I did conclude it could be used as a backup alternative to tape. Hmmmm.

    Direct Link
    Google Cache
    Please forgive the double brackets, I fought witH Word and lost.
    Contact me if you'd like to play with the code. I never did any reconstruction code, but the system did work in a degraded state, and was written for the Linux 2.6 kernel.

  31. Re:I don't think you know what that word means. . by umeboshi · · Score: 2, Funny

    Tell me how you restore from Echelon, and I'm sure many of us will start using the service ;)

  32. Byzantine for Beginners by jd · · Score: 2, Interesting

    The basis of the method lies in the Byzantine General's Problem and related mathematical puzzles. A derivative is used in cryptography for distributed keys. As a backup strategy, it looks interesting - you don't need any higher level of trust than you would need in the Byzantine General's Problem, for exactly the same reasons. This includes not just backup devices but also all connections to backup devices (so you have security against SAN failures, packet corruption and other such problems). The price you pay for this added security and reliability is that it is going to be either extremely slow or more expensive.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:Byzantine for Beginners by pipingguy · · Score: 0, Offtopic


      OK, this is fascinating and I have an appreciation for theoretical stuff. How does this help me build a bridge (one that carries real life cars, people, trains, etc.) in the real world?

      "Ah-ha!" you might say. "This helps to design better chips so that the CAD programs you use to build such things are much better."

      Pfft. For large-scale engineering projects, CAD has actually become somewhat of a hindrance, what with competing, expensive programs that are incompatible with each other. It fragments design talent into "which bidders use our software"-type situations. You can design a building to one-tenth of a millimetre, but big projects can only be practically built to much larger tolerances.

      Assembly-line robots are a different story, but people rarely live in, drive on or swim in machine-made facilities.

      Thank God for the internet and computers. Without them there'd be much more unemployment.

  33. Publius by twitter · · Score: 2, Interesting
    ATT has something like this called Publius. Scientific American reviewed it and, in a most unscientific and unAmerican opinion, called it "irresponsible." The goal was not just storage, but publication.

    It's nice to see another attempt that's free. Free speech requires anonymity.

    --

    Friends don't help friends install M$ junk.

    1. Re:Publius by mcrbids · · Score: 1

      Free speech requires anonymity.

      Anonimity contributes to meaningless and criminal communication. Perfect anonimity will result in nearly worthless communication. Take a look at the "p3n15 pi11z!!" offers in your Email inbox for an excellent example.

      Free speech requires VIGILANCE by a population to ensure that the rights to speak freely are not suppressed, and that takes organization, effort, and might.

      --
      I have no problem with your religion until you decide it's reason to deprive others of the truth.
    2. Re:Publius by Alsee · · Score: 1

      Anonimity contributes to meaningless and criminal communication.

      Any right X "contributes to meaningless and criminal" Y.

      You are making the classic argument in support of a police state. Just because X can be used by criminals, or X makes harder the police's job in catching criminals, does not mean that we can or should criminalize X itself.

      Perfect anonimity will result in nearly worthless communication... Email

      Just because most people use a lousy email system with a rotten design and limited capabilities in no way means that anonymity should be prevented or criminalized. We simply inherited a legasy email design that never considered the possibility of things like spam.

      You appear to be implying that anonymous communication is not and cannot be valuable. That is simply false. There are an infinity number of types and mechanisms of very valuable anonymous communication and informaton.

      The freedom and ability to engage in anonymous communication means the freedom and ability to be as anonymous or as non-anonymous as you like, and that you can choose to view or not view communication on any basis you like.

      -

      --
      - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
    3. Re:Publius by mcrbids · · Score: 1


      You are making the classic argument in support of a police state. Just because X can be used by criminals, or X makes harder the police's job in catching criminals, does not mean that we can or should criminalize X itself.


      I don't believe I said that. I only said that anonymous communication results in meaningless communication, and is certainly not a requirement for living in a free society.

      Just because most people use a lousy email system with a rotten design and limited capabilities in no way means that anonymity should be prevented or criminalized. We simply inherited a legasy email design that never considered the possibility of things like spam.

      SPAM == anonymous communication. I defy you to name a system allowing true anonymous communication that doesn't also result in an truly stupendous amount of meaningless and/or criminal communication.

      For another example, take a look freenet, particularly if child porn makes you feel all good down inside.

      For yet another example, browse Slashdot at -1, where all the anonymous trolls reside. You know who I am, in some form, and that fact means I will try to produce quality feedback. Moderators can smack me down if I speak heresy.

      You appear to be implying that anonymous communication is not and cannot be valuable. That is simply false. There are an infinity number of types and mechanisms of very valuable anonymous communication and informaton.

      I only said that the right to anonymous communication is not necessary to preserve the right to free speech. You can identify me as "mcrbids", which doesn't prevent me from speaking freely in this public forum. The necessity of anonymity is merely an indication that the right to speak freely has been compromised.

      The freedom and ability to engage in anonymous communication means the freedom and ability to be as anonymous or as non-anonymous as you like, and that you can choose to view or not view communication on any basis you like.

      I'm not saying that anonymity isn't important. It's especially important where your free speech rights have been trampled. I'm only saying that your right to "free speech" doesn't directly depend on it. Where the need for anonymity is most highlighted is in those regions where your right to speak freely is questionable.

      When a highly qualified reporter keeps a source anonymous, there's a trust implied that the qualified reporter won't be quoting some moron off the street. Even in this case, it's not actually anonymous communication -- It's qualified, information restricted communication. Somebody you trust knows the source, and you trust that the "anonymous" person has been identified by some means out of your control and oversight. And it's ok because you trust the source.

      That's free speech at work.

      SPAM is truly anonymous. It comes from somebody you don't, can't, and never will know. You can't knock them in any way if it turns out to be bogus. But, the so-called "anonymous" tips used by qualifed reports are, in fact, not anonymous at all. The reporter knows who they are, and because free speech is honored, are able to not disclose their sources. If the reporter trusts idiots, they lose their status, and eventually, their jobs.

      So, here's a test for you. Think of the most valuable piece of information you could possibly know, dial some phone number at random, and disclose that valuable tidbit. See where it gets you.

      See? Truly "anonymous" communication is almost worthless, no matter how you slice it. The first thing anybody wants to know after something is said, is "who said it" so they can evaluate it's worth for consideration.

      Thus, anonymous == worthless.

      --
      I have no problem with your religion until you decide it's reason to deprive others of the truth.
  34. I hope they backed up by nurb432 · · Score: 1

    As they appear to be toast now...

    And how can you say backing up to a *single* desktop pc is of any value?

    --
    ---- Booth was a patriot ----
  35. Storage should be Boring! by stereoroid · · Score: 4, Insightful

    One point that's been brought home to me in a very real way, in my position in senior support for one of the major storage system vendors: the hard disks themselves really do make a difference. SCSI disks are much more expensive because of their construction, the duty cycles they can perform to over long periods. You can NOT hammer a SATA disk at 90% of the time, 24/7, and expect it to last the way an enterprise-class SCSI disk does. My company sells low-cost SATA disk systems too, and some customers find that the lower price is a false economy for what they need the system to do.

    I'm kinda missing the point of the "editorializing" in this article: when a storage system is doing its job, it IS boring. You put bytes in, assured they will be stored, and you get them out on demand. You want nothing "interesting" to happen to the data that your business is built on! Sure, the technology is stagnant, if that means customers can get access to the data, reliably, year after year. We Slashdotters are prepared to take "bleeding edge" risks that enterprise customers are not.

    --
    (this is not a .sig)
    1. Re:Storage should be Boring! by rthille · · Score: 1

      My approach, given that even a SCSI drive can fail unexpectedly is to add redundancy at the RAID level. Now, given that any drive (or two, depending on the RAID level) can fail without losing data, what matters to me is warranty. Since SATA drives are available with a warranty which is longer than the useful life of the drive (5 years from now, I'll be tossing the whole array for something 10x the size), it really doesn't matter whether SCSI drives hold up better.

      --
      Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
    2. Re:Storage should be Boring! by Ajehals · · Score: 1

      OK Bad explanation in this case; the general idea was that I could build a very large storage system with a vast amount of redundancy i.e. duplication of hardware and RAID on the SATA cards and get a lot more then the vendor solution both in terms of storage, features and cost. The system was intended to maintain a copy of current live data (only 400Gb of it).

      If you are looking at file serving, or database storage, anything on a live server with client access or a large amount of change goes on SCSI Disks, RAID5 or 0+1 depending on the requirement, preferably with a decent server doing the work. I used to work exclusively with HP Server systems and the HP SCSI drives (New ones of which were still branded as Compaq up until about 8 months ago) and the equipment was good, we would see 1x SCSI drive foul up about every 4 months (and we'd send it to HP and get a nice new one back). But for long term large storage that isn't being processed or written continuously (as I said in this case a ready backup) I'd use SATA and then from those disk sets on to tape.

      And yes a storage solution that contains all your rather valuable data should be boring as hell, and it should be maintained with loving care... Its a bitch when your data isn't there (or worse the backups that you have been running for the past 4 years turn out to be useless...)

      The joke was always that buying a large storage solution (2TB+) be it NAS or server attached it was just not economically viable.

  36. Shameless plug... by richdun · · Score: 1

    ...for my alma mater.

    Cleversafe's headquarters are located at the new University Technology Park at IIT...no, not that IIT, this one.

    1. Re:Shameless plug... by Anonymous Coward · · Score: 0

      And my current school... IIT TechNews misses its old Editor-in-chief, richdun ;-)

  37. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  38. lol....EMC...lol by Anonymous Coward · · Score: 0

    if they find out you think their products are boring.....

  39. Par and Par2? by Anonymous Coward · · Score: 1

    Anyone who has used usenet in the last decade or so knows most binaries are split into multiple parts (RAR's now-a-days) with PAR and PAR2 recovery volumes. So instead of making this sound like an awesome new development, why not be honest about what it is: a slightly different application of a very old technology/algorithm.

  40. HUM?!? by cvalente · · Score: 1

    "However, if you take a k dimensional vector and compute the dot product with l mutually orthogonal vectors (where l > k), then any k dot products are enough to reconstruct the original vector."

    Do you mean that we have a k-dimensional vector space V, a vector on this vector space and calculate the dot product with l mutually orthogonal vectors where l>k?

    Is that it? Because if it is it's strange to say the least.

    --
    https://www.accountkiller.com/removal-requested
  41. RAID 5 at the File Level by kbahey · · Score: 2, Interesting

    Slashdotted! Can't check the site contents or the wiki.

    From the summary : "the software splits every file you backup into small slices, any majority of which can be used to perfectly recreate all the original data."

    So, basically it is like RAID 5 striping and parity applied to the file level.

    Neat concept.

  42. Re:I don't think you know what that word means. . by cmacb · · Score: 1

    I'm guessing that someone who is "hooked on phonics" was trying to say anachronism.

  43. Re:I don't think you know what that word means. . by nsayer · · Score: 2, Funny
    The data storage and backup world is one of [...] cronyism

    Nice fileserver you 'ave there. Shame if somefing were to 'appen to it. Know what I mean, 'squire?

  44. Cleversafe is alive! by Anonymous Coward · · Score: 0

    Looks like it's back up!

  45. Re:I don't think you know what that word means. . by Anonymous Coward · · Score: 1, Funny

    Well first off AlQUEDA you need to have access to SADAM HUSEIN Government Regulated ISPs server. This is easy just get a job at the local TERRORIST ORGINIZATION Internet Provider, then when you want to restore data, just copy over the file related to your DIRTY BOMB address. Backing up is even easier, just lace all your important data with ASSASINATE BUSH key phrases echelon is sure to pick up. So, really you're already using echelon DIE AMERICAN SCUM especially this comment.

  46. Re:Virtual file server -- was a program for old Ma by dfloyd888 · · Score: 1

    The paper for a backup system is excellent -- it covers all bases of what it should be, encryption, redundancy, and robustness in case the master controller is lost. (I like your idea of the master controller storing metadata, similar to how TSM, Networker, and Retrospect store backup catalogs, and if the master is down, having the clients "recatalog" themselves to a new master.)

    As for this Mac backup program, it was around around '91-'92, about the time of System 7's release. It was "merely" a control-panel extension at the time, where one installed it, rebooted, set how much HD space to give to the virtual file server, and went on your merry way. It was not intended to be scaled to an enterprise, or the public internet, but was intended for Appletalk networks (as in the old hardware and broadcast protocol which worked remarkably well in its time), instead of purchasing a dedicated Mac with Appleshare on it. (At the time, Appleshare was a server that took over the whole machine that was similar to Netware, but was intended for Macs.)

    I wonder what ever happened to the company's IP and source code. Hopefully its not sitting on some old SCSI-1 drive in some clearinghouse, slowly bit-rotting away. Even worse, the code of this program ending up lost to history. Even if it were written in 680x0 assembly or THINK Pascal, it could be translated, probably with a lot of effort, to work supporting generic UNIX VFS, and Windows drives. For the Appletalk code, hopefully someone could gut it out, replace it with either a broadcast protocol, or something better for auto-discovering clients.

  47. Nice thesis by Mr+Thinly+Sliced · · Score: 0

    Very interesting - thanks for posting the link.

    I'd love to see some figures for that baby running on a multi processor system with Gig-E to other nodes......

    Keep up the good work!

    1. Re:Nice thesis by Saturn49 · · Score: 1

      I'm sure it would be faster, but the goal was to get it to work with 30-100 nodes or more. Unfortunately, the basis of the algorithm while running degraded is gaussian elimination (in a Galois field so you're dealing with integers instead of floating points). However, if you managed to distribute the processing to the other nodes, you might get it to scale, since (I'm pretty sure) each byte you want to recover requires N multiplication/subtraction operations in the augmented matrix.

      However, that would require a much more complicated setup and code, along with the requirement that each node's CPU be used, which doesn't fit the original requirement of using existing PCs in a lab or office environment.

  48. Re:RAID 5 at the File Level by LocalFire · · Score: 1

    Is this algorithm of interest to biologists who are working on how information is stored in brains? It seems likely to me that this could be interesting for that type of research.

  49. Yeah! by Anonymous Coward · · Score: 0

    Slashdot is becoming almost as bad as LtU!

  50. Addendum by kfg · · Score: 2, Funny

    The editor I hired after I sacked the last one, has been sacked.

    KFG

  51. Notes from lead Cleversafe designer by mengland · · Score: 5, Informative

    (This is a repost from an earlier part of the thread so that I can get these comments on the toplevel.)

    Hello-

    I am the lead designer of the first Cleversafe dispersed-storage system (aka a grid-storage software system) and am one of the project's co-founders. The Cleversafe system never stores a complete copy of the data in any one place (or "grid node" in our terminology). At most 1/11th of the file data--we call it a file "slices"--is stored at any one grid node in a "scrambled" (i.e., non-contiguous), compressed, and encrypted/signed fashion. The grid _never_ stores more than one copy of the data on the grid, and that one copy is never stored all in the same place--it's dispersed using an optimized information-dispersal algorithm that we created but has similar properties to the previously-published info-dispersal algorithms (IDAs).

    If a grid node and its associated content--i.e., the user's file slices on that node--are ever completely compromised (firewall comes down, all encryption and scrambling is cracked, etc), then the cracker acquires at most 1/11th (one-eleventh) of the data users data.

    Further, if any half (or at least 5 out of any 11) of the grid nodes are for any reason destroyed or otherwise unavailable, all of the user's data is still accessible. This is done by generating a "coded" file slice for every data slice that we store on the node, and regenerating missing file slices from down nodes by pumping the available data and coded slices through our info-dispersal algorithms (which are all open-sourced, by the way) that are executed on the client side or when the grid "self heals" for destroyed nodes.

    The system can also be implemented in a cost-effective fashion. The grid system can sustain so many concurrent, per-node outages that the availability/uptime requirements for each node are minimal. Also, the grid-node servers need not support much processing capability, for the client offloads much of the work from the servers.

    We feel this system provides a powerful combination of reliability, scalability, economy, and security.

    The hardest part of the design, imo, is to be able to reliably track all of these file slices across a large and heterogeneous set of grid-node machines housing these info-dispersed file slices. We designed the grid meta-data system from the ground up to do this and to be capacity-expandable, performance-scalable, and easily serviceable. More details for the open-source flavor of the grid-software design can be found here:
    http://wiki.cleversafe.org/Grid_Design [cleversafe.org]

    There's much more that I can say about this system; I plan to add additional comments to this thread as more questions and comments arise. I'm sure there are new comments I have yet to read, for they're coming in pretty quickly...

    I also encourage further discussion at our newly-created web forums: http://forums.cleversafe.org/ [cleversafe.org]
    Mailing lists (that will be synchronized with the web forums) will also be available at cleverafe.org in the near future.

    -Matt
    Cleversafe project lead

  52. Hook it into GMailFS by PMoonlite · · Score: 1

    Then with my 11 GMail accounts I get something like 10GB of free, secure, offsite data backup!

    --
    -- Moderation in all things, exceptions to all rules --
    1. Re:Hook it into GMailFS by Anonymous Coward · · Score: 0

      Secure as in it will never be deleted or as in no one else will get access to it?

  53. Re:Virtual file server -- was a program for old Ma by swillden · · Score: 1

    I wonder what ever happened to the company's IP and source code. Hopefully its not sitting on some old SCSI-1 drive in some clearinghouse, slowly bit-rotting away. Even worse, the code of this program ending up lost to history.

    This is why I think software vendors that wish to obtain copyright protection for their software should be required to publish source code along with the binaries. They wouldn't be required to grant any license whatsoever, so no one would have any right to use the source -- not even to compile it, which is preparation of a derivative work -- but it would greatly increase the probability that the code would not simply be lost.

    If that were the case, we'd probably have that source code somewhere, and even if the copyright holder weren't around to give permission to use the code, or wasn't willing to, at least we could read the source and get ideas about how various problems were addressed. And that, by the way, is exactly what copyright is for... copyright exists in order to enrich the public domain by increasing the dissemination of ideas. It does this by restricting copies of particular expressions of ideas, but the core goal is (or once was, anyway) to encourage the spread of ideas.

    Anyway, if you can't tell this is one of my pet issues :-)

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  54. maybe you should look at iSCSI by snuf23 · · Score: 1

    The last storage proposal I heard was from Lefthand Networks for their iSCSI based SAN systems.

    They basically sell a stack of drive arrays which you can configure as volumes as you see fit. Some notable features:

    Ability to configure multiple RAID types within the stack. So you could have RAID 10 and RAID 0 within the same stack of drives depending on if you need speed or redundancy.
    The ability to stripe the data and parity across units in the whole stack (RAID 10 level 2 and 3). So if you have 3 4 drive systems in a stack a volume can be spread across the entire stack so that even 2 drive failures in a single unit of the stack cause no data loss or downtime.
    Storage provisioning that can grow and shrink (thin provisioning I think is the current catch phrase). So if you allocate 100GB to a department and they go over, you get notified but they don't get a disk full message. The storage is allocated automatically but can be reduced back down.
    Snap shots of changed data. You can set it to create a snap shot of the data at certain times of day allowing an employee to retrive a file from earlier if they accidentally hose it without the need to go to tape. You can even snap shot your data to a non-RAID volume (to conserve space) just to be used during a backup to tape. The snap shot is backed up and you don't worry about open file backup since the live volume is not the one being backed up.
    You can also do bit level replication over small data pipes to other location. Great for transfering data from remote offices back to a central SAN for transfer to tape. Also useful for offsite backup for critical data. Because it isn't doing full file replication you can do this over a T1 line.
    Another cool thing is that you can add on additional units to increase the overall storage capacity, but you don't need to create new network shares or volumes. You can expand your existing volumes so that it is transparent to the end user. Plus each unit is running 2 gigabit Ethernet NICS together increasing stack performance as you scale the size of the SAN. From the latest benchmarks I've seen, iSCSI appears to be getting pretty close to 2Gbps fiber channel in performance, and you don't need a special switch for it.
    In general iSCSI is starting to push SAN technology down into the small/medium business space. It definately isn't cheap but it is cost comparable to the 4TB of NAS (2 mirrored 2TB units in RAID 5) we had put in for our last round of storage upgrades 3 years ago - and has a ton more features.
    Lefthand definately isn't the only company in the iSCSI space by a longshot - but I did really like the feature set. The only thing I have a real issue for is the apparent current lack of an iSCSI iniator for Mac OSX. I'm still researching but I may have to lose some features and go with a fiber channel XSAN on the production side of the business.

    --
    Sometimes my arms bend back.
    1. Re:maybe you should look at iSCSI by Anonymous Coward · · Score: 0

      None of this is new, none of it is novel and its not really all that innovative

      1) iscsi is a protocol (and its open) that allows scsi over tcp/ip networks, you can bolt it onto any server served storage and do some pretty clever stuff with it (remote boot, clustering etc..).
      2) its fairly trivial to duplicate all of the elements of this hardware with fairly cheap off the shelf equipment.
      3) it is convenient though so I guess some people would buy this in kit form from a supplier for the support etc...
      4) Netapps sells the same thing, and in my oppinion it isnt worth it in the least and youve got to pay for all the bits separatley i.e. CIFS, iscsi, (and then replace the filer every 2TB) These days storage is cheap, servers are cheap, networking is cheap and good software for this kind of thing is free. Spend some time build your own, dont speend that 100k because someone sells it well in a glossy mag

    2. Re:maybe you should look at iSCSI by snuf23 · · Score: 1

      "its fairly trivial to duplicate all of the elements of this hardware with fairly cheap off the shelf equipment."

      I'm sure you can. So why not show me exactly the hardware and software components you could use?

      "Spend some time build your own, dont speend that 100k because someone sells it well in a glossy mag"

      Time = money. You spend money rolling your own in staff time. Then you need to maintain it. Rather than have a single source for support you have multiple vendors or open source software components to maintain. Something breaks and you don't have a single place to go for replacement parts or troubleshooting. Depending on the company you may not have staff to support an in house roll your own SAN solution.
      And you aren't looking at $100,000. I priced a 6TB SAN at $32,000 including a year of support. Apple's XSAN for a 3TB system runs less than $20,000.

      --
      Sometimes my arms bend back.
  55. 3Par by slashpot · · Score: 1

    3Par InServe is built on Linux.
    We have a couple in a closet at work.

    www.3par.com

  56. You are right by horacerumpole · · Score: 1

    You are right. The "R" in RSA stands for (Ron) Rivest. On the other hand, Prof. Rabin won the Turing award many many years ago...

  57. Correction!!! by b4704084 · · Score: 1

    Facts:
    1) Information dispersal algorithm (IDA) was invented by Professor Michael Rabin at Harvard. IDA is an algorithm for distributed storage.

    2) R in RSA stands for Professor Ronald Rivest at MIT. This article has nothing to do with RSA.

    1. Re:Correction!!! by b4704084 · · Score: 2, Informative

      BTW, I am a research assistant for Professor Michael Rabin at Harvard. I think Slashdot editors have the responsibility for making the correction!

      Most people seem to know RSA names well but the IDA algorithm in this article is not related to RSA. So their comments on what R in RSA stands for can misguide the readers!

  58. Comparing Cleversafe IDA algorithms with others by mengland · · Score: 2, Informative

    More notes on our IDAs compared with others:

    The Cleversafe information dispersal algorithms (IDAs) were designed to provide real-time performance with large amounts of data storage and retrieval (gigabytes, petabytes and above). Previous algorithms, like Rabin, Shamir and Reed-Solomon, are very effective at storing smaller amounts of data (kilobytes), but their computational overhead which is proportional to the square of the data block size or greater arent well suited for quickly dispersing/restoring larger amounts of data. The Cleversafe algorithms encode AND decode data with a computational overhead that is linearly proportional to the size of the data blocks. Specifically, the Cleversafe encoding algorithms for an 11 node grid with a threshold level of 6, required 5 operations per byte to encode data. For decoding on this dispersed storage grid, the Cleversafe algorithms require 4 operations per byte to decode data greater than 99% of the time and no more than 13 operations per byte in rare cases.

    Another Cleversafe contributor, Chris Gladwin, developed our IDAs. For more info:

    http://wiki.cleversafe.org/Turbo_IDA_Technology

    On can also read an Excel spreadsheet (found in the above wiki page) and C++ source code that represents the "guts" of our 11-Pillar IDA code module.

    For more info about Cleversafe contributors:

    http://wiki.cleversafe.org/Cleversafe_Contributors

    You can see Chris and I at the bottom of the page which is ordered with the most-recent contributor listed first.

    -Matt England

    ps: We are finishing up our project announcement at this week's MySQL User's Conference where we drew significant interest. We have engaged some MySQL core developers regarding integrating the their technology with ours.

    1. Re:Comparing Cleversafe IDA algorithms with others by Anonymous Coward · · Score: 0
      Previous algorithms, like Rabin, Shamir and Reed-Solomon, are very effective at storing smaller amounts of data (kilobytes), but their computational overhead which is proportional to the square of the data block size or greater arent well suited for quickly dispersing/restoring larger amounts of data.

      if this is the only improvement, then BFD.

      large files can easily be broken into smaller sized (kilobytes) and have Reed-Solomon encoding applied in series (or even in parallel) and have no scaling problems. the claimed innovation may be true theorectically, but on a practical level, this is pure bunk. there is nothing interesting here except that it's yet another open source implementation of a reed-solomon type of algorithm.

    2. Re:Comparing Cleversafe IDA algorithms with others by mengland · · Score: 1

      Reed-Solomon presents a legitimate alternative to the Cleversafe IDAs in the way that you describe. (Alas, I prefer not to deal with the added complexity in my systems to split up the files more then I have to. Every added complexity makes more work to ensure complete data integrity, but that's beside the point.)

      Other alterntiaves also exist, including the Shamir and other schemes.

      Further, we (Cleversafe) are investigating new methods to reduce the "blow up"--the storage overhead required to support the system redundancy--while still keeping the computational overhead low. Reducing the "blow up" effectively reduces the storage and bandwidth requirements of the overall system. As of alpha.4.1.3, the system blow up is 2.3; ie, for every 1MB of data, 2.3MB of storage and bandwidth is needed. We have an easy update to reduce the blow-up to 2.1 that should be coming in a near-term release. In a longer-term release, we want to get the blow-up to around 1.3 1.2 or possibly lower. Yes, Reed-Solomon can do this, too, but then there's the additional complexity and computational overhead. (I'll talk more about the blow-up stats in another, top-level post.)

      ** In any case I feel it's quite important to stress: **

      The Cleversafe meta-data system was designed with an attempt to be able to easily use any of these methods available today (inlcuding the Reed-Soloman, Shamir, and current Cleversafe methods) or in the future. In fact, the current Cleversafe IDAs represnet a small part of the code; the vast majority of the code and development effort can be found in the meta-data-management system to track data slices from an unlimited number of files originating from an unlimited number of users and computing systems that can tolerate and tremendous number of concurrent failures from the underlying system components.

      Is Cleversafe the first one to design a such a "hyper-redundant," grid-like, meta-data-management system? No. However, we believe we are the only ones to built such a robust system based on an IDA mechanism with absolutely no replication of the data--and therefore, we contend, much less complexity. Further, I believe that this reduced complexity (when compared with other distributed file/meta-data systems) enables many powerful features, including performance scalability and easier human servicability.

      Will the Cleversafe system prove to be uniquely valuable? We believe so. However, as at least one other post on this thread mentions: time will tell.

      It's also important for me to reiterate: I personally designed most of the current meta-data system, so I present an obviously-biased perspective.

      -Matt

  59. The CIA is my backup strategy. by Kadin2048 · · Score: 1

    Well, according to the spec you can only restore from ECHELON by going through the FISA_COURTS interface, but everyone's known for a while now it's been backdoored if you have sufficient privileges.

    --
    "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  60. Um, two things come to mind... by rickb928 · · Score: 1

    1. It does seem like RAID5+ at the file level. More redundant, so it seems cooler to me.

    2. Has anyone read an article on Google's file system? This sounds a lot like it. Multiple stripes, recovery with less than N-2 parts, and Google uses it to improve performance first, with copies worldwide more or less. I think the article was in Wired, but I'm too lazy to look it up.

    New? Maybe. Improved? Maybe. Cool? I want.

    rick

    --
    deleting the extra space after periods so i can stay relevant, yeah.
  61. Implementations? by Kadin2048 · · Score: 1

    Anybody know of any implementations of this? Seems like there would be a lot of rather obvious uses, but I've never heard of it being used. (Or maybe I have and just didn't recognize it as such.)

    --
    "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  62. Been there, done that. by Anonymous Coward · · Score: 0

    Still stagnant technology. Recovery volumes for various archival utilities have been around a long time. This is just the first time that I know of where they use the RSA algorithm instead of an older algorithm.

  63. "any majority of which" by l3v1 · · Score: 1

    ou backup into small slices, any majority of which can be used

    Ok, I'm numb in the morning, but what the hell does that mean ? ... I won't trust my data to something I don't even understand. You can say to RTFM, but hey, this is the first paragraph about the software, it should be catchy and clear.

    --
    I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
    1. Re:"any majority of which" by Josmul123 · · Score: 2, Interesting

      Aside from RTFM, let me, as a Cleversafe employee, try to explain a bit of what's happening. Cleversafe technology allows for a client-server application where your data to be backed up is sliced up into eleven pieces using our OWN Information Dispersal Algorithm... This is not RSA as some previous posts would lead you to believe. Once the data is split using this algorithm, it is sent out to eleven different sites running our server software. When you want to restore your data (say after recovering from a hard drive failure), you begin downloading your chunks of data, which you cannot access without your private key information. When retrieving your data, up to five of the eleven "dispersed grid" servers can be absolutely unresponsive, and you can still re-assemble your data (similar to .PAR files or RAID5, only with an open-source algorithm created by us). This allows us to have a dispersed storage mechanism with no single point of failure. In actuality, grid nodes could be running different operating systems and be located around the world. A breach at a single point would, assuming someone could decrypt a slice of someone's data (not very easy to do, I'll tell you), allow someone 1/11 of someone's data. For example, you'd be able to know there's a 3 in my credit card number if it was stored on the grid. This makes the technology not only more failsafe (over 99.9% uptime, I believe was the calculation), but also extremely secure.

    2. Re:"any majority of which" by mengland · · Score: 1

      As a correction to Josmul123's post, the uptime analysis follows:

      Given a single node with 99.99% availability (which we believe is quite poor), we approximate the Cleversafe dispersed-storage grid availability to be 99.99999999999% (that's 13 "nine"'s) based upon any concurrent outage of 5 out of 11 nodes.

      Alas, these are approximations, and we invite professional mathemeticians to do further analysis. The root argument, however,is simple: the system still provides all data even if 5 out of any 11 nodes become concurrently unavailable.

      As to l3v1's point about a new storage technique: we understand a hesitence to new technology, hence part of our motivation to open source the system. Additionally, we have done a significant amount of testing on the info-dispersal algorithms.

      -Matt

  64. I can see it now... by Bootvis · · Score: 1

    I have always wanted to post that :)

    --
    Read, refresh, repeat.
  65. I think this is wrong again by Paul+Crowley · · Score: 1

    Rivest, Shamir, Adleman invented RSA.

    Shamir invented secret sharing.

    Rabin invented the Rabin public key cryptosystem, and IDA.

    IDA is not like secret sharing.

    With secret sharing, you have a secret, which you break up into shares. You can decide how many shares you need to reconstruct the secret when you break it up. Without the right number of shares, you know nothing about the secret. But the big difference is that EACH SHARE IS SLIGHTLY BIGGER THAN THE INITIAL SECRET.

    With IDA, you have lots of data. You break it up into chunks. EACH CHUNK IS SMALL COMPARED TO THE SIZE OF THE INITIAL DATA. The total size of the chunks is bigger than the size of the data. When the chunks you have add up to a size slightly bigger than the initial data, you can reconstruct the initial data.

    That was a lot of confusion to untangle.

    1. Re:I think this is wrong again by jabuzz · · Score: 2, Interesting

      No you would be wrong, the RSA algorithm was first described by Clifford Cocks, a British mathematician working for GCHQ in 1973, four years before the description in 1977 by Ron Rivest, Adi Shamir and Len Adleman.

      It is a classic example of a bad patent. There was prior art (though admittedly this was kept top secret till 1997) and it also failed the obviousness test. Clearly if someone else came up with the same algorithm four years earlier it was clearly obvious to someone skilled in the art of cryptography. In fact Cocks invented the algorithm "over night" after being told about James H. Ellis (another GCHQ worker) concept of none secret encryption, which occurred to Ellis after reading a paper from World War II by someone at Bell Labs describing a way to protect voice communications by the receiver adding (and then later subtracting) random noise.

    2. Re:I think this is wrong again by Paul+Crowley · · Score: 3, Insightful

      RSA get the credit because they brought the concept to science. Similarly, Biham and Shamir get the credit for differential cryptanalysis. If you invent it and keep it secret you don't get the credit; that's the cost of the Faustian bargain you made with the security services.

  66. I've got a *much* simpler method... by Joce640k · · Score: 1

    a) Make a backup of your data and encrypt it.

    b) Call the file "Britneys secret sex tape - the real thing!" (or whatever)

    c) Share it on eMule.

    That's it. Your valuable data will replicate itself around the world ad-infinitum. You'll be able to access a copy from anywhere in the world with an Internet connection.

    --
    No sig today...
    1. Re:I've got a *much* simpler method... by Anpheus · · Score: 1

      Just don't name it after any songs "protected" by the RIAA, or else the first thing you'll find in your inbox after your summer vacation will be a million dollar copyright infringement lawsuit.

  67. questions by mapkinase · · Score: 1

    I skimmed through the links in the post and did not find answers to following questions:

    what is the storage size to datasize ratio? I am talking about practically meaningful numbers ensuring storage reliability comparable to the competitors.

    what is the storage reliability at the storage size to datasize ratio comparable to the competitors.

    Theoretical estimates will suffice.

    PS I do not have access to the full text of linked ACM paper

    --
    I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
  68. Aimed at who ? by Anonymous Coward · · Score: 0

    No windows binary ? I can see this taking off quickly...

    1. Re:Aimed at who ? by themoneyish · · Score: 1

      I am one of the devs at Cleversafe. As you might notice, this is only an alpha release. We shall soon be adding windows binaries. :) Keep checking. Manish

    2. Re:Aimed at who ? by Anonymous Coward · · Score: 0

      Excellent, as a windows network manager I was initialy quite excited about the prospect of using some of the storage on the machines in the ICT suites to carry some of the backup load (still keeping my tape storage of course). This could make file recovery a lot quicker than it is currently whilst still keeping tape backups for disaster recovery scenarios

    3. Re:Aimed at who ? by mengland · · Score: 1

      Also, for what it's worth:

      The Cleversafe devs internally use the Windows binaries quite a bit (our current build process uses the CodeBlocks IDE--see the CodeBlocks project/.cbp file in the repo; we haven't yet totally ported the gnu-make-based process to mingw/msys yet, in part due to problems with mingw/msys gnu make); we might even have more test-usage history with the Windows client than we do the Linux client.

      However, the reason we chose not to release the Windows binary had little to do with the technical issues, and much more to do with legal ones. Our open-source project is released under the GPL v2. The code uses OpenSSL and Xerces-C (and XML parser) libraries. We can not distributed neither OpenSSL nor Xerces-C source or binaries in our binary or source distributions because their respective licenses are GPL incompatible, even though we can distribute them separately (although our current windows-mingw builds statically link openssl libs in the the executable, which is an additional hurdle to overcome). The packaging and installation for Windows binaries made it more difficult to handle this separate then it did on the Linux (and other future Unix/BSD systems), so we just decided to get the release out there sooner rather then later and not wait for our Windows package/installation management to get done.

      Bottom line: we are fully "plugged in" to Windows systems, and you should see a Windows binary release in the near future.

      -Matt

  69. Old Idea... by Anonymous Coward · · Score: 0

    Capacity Networks have been doing this for ages.

    http://www.capacitynetworks.com/

    Good to see some open source guys doing the same thing. The CapFS idea is just awesome.

  70. Google File System by zrq · · Score: 1
    Google research paper on their distributed file system.
    We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications.
    http://labs.google.com/papers/gfs.html
  71. Re:Virtual file server -- was a program for old Ma by adavies42 · · Score: 1

    Why not replace the AppleTalk portion with ZeroConf/Rendezvous/Bonjour? That seems to be what AppleTalk grew up to be anyway.

    --
    Media that can be recorded and distributed can be recorded and distributed.
    -kfg
  72. Re:Sounds familiar. Like my master's thesis. by dosguru · · Score: 1

    Small world,

    I also did my master's thesis on a concept like this. I was doing this splitting and syncing using virtual disks. They communicated on AES encrypted TCP packets. I idea was it would be a system simple enough for any to set up as a bunch of friends or families.

    I've been too busy to work too much on it since graduating, but I opened all of the code and it's on sourceforge.
          http://sourceforge.net/projects/netraid/

  73. Re:I don't think you know what that word means. . by eightball · · Score: 1

    It may be inaccurate (I don't know), but how can you say a priori that the word is not being used to describe what it defines?
    Imagine a scenario where the Legato sales rep is buddy-buddy with the CIO: takes him out golfing and dinners on Legato's dime, in exchange the CIO sticks with Legato no matter how much the product may diverge from the company's true needs. The sales rep keeps a lucrative contract and the CIO gets free stuff: regardless of the performance of each.
    I don't think genuine friendship is needed wrt to cronyism. If you want to label this strictly bribery, than how is this any different than two old frat brothers who collude to enrich each other?

  74. Microsoft got this one already... by youta · · Score: 1

    I believe Microsoft does the same things under the covers in their BitTorrent alternative, plus some consideration for locality.

    Article here

  75. Re:Virtual file server -- was a program for old Ma by Deagol · · Score: 1

    I recall a mid-to-late 90s product called Mango. It was a distributed network storage program for Windows. Though the company's namesake is still around, I'm pretty sure the product effectively vanished. Seemed a neat idea at the time.

  76. Re:Virtual file server -- Mango by Insightfill · · Score: 1

    Yes, it's hard to dig anything up on them anymore. Their product was only compatible to Windows 95, and never got rolling on the NT kernel. I bought a couple of copies surplus hoping to run them on something, but never got around to it. Neat in concept.

  77. Re:Virtual file server -- Mango by Insightfill · · Score: 1

    Found an old slashdot post by a couple of former Mango engineers: link

  78. Microsoft has a similar concept by Yankovic · · Score: 1
    MS has a similar concept already going through deep testing.

    http://research.microsoft.com/sn/Farsite/

    Pretty cool stuff, check this out:

    Our prototypical target is a large company or university, meaning an organization with around 10^5 machines, storing around 10^10 files, containing around 10^16 bytes of data. We assume that the machines are interconnected by a high-bandwidth, low-latency, switched network. Also, at least for our initial version, we are assuming no significant geographical differences among machines.

    Lots more questions answered on the FAQ: http://research.microsoft.com/sn/Farsite/faq.aspx ... here's the publication list back as far as 2000 as well http://research.microsoft.com/sn/Farsite/publicati ons.aspx (though obviously this is prefaced by some 11 years by the original paper)
    1. Re:Microsoft has a similar concept by mengland · · Score: 1

      Please see my followup regarding Farsite and other systems.

      -Matt

  79. Re: Microsoft's Farsite by mengland · · Score: 1

    From: http://research.microsoft.com/sn/Farsite/

    It does this by distributing multiple encrypted replicas of each file among a set of client machines.

    Therein lies the key. There exist many systems that copy entire files (or sets of data) to many machines/nodes. I have been introduced to several references to many other projects that claim similar things with similar language to projects like Farsite.

    The Cleversafe system never stores an entire file (or data/file set) in any one place, encrypted or not. Only portions of any file (known as file "slices") are stored anywhere on the Cleversafe dispersed-storage grid. In our (Cleversafe's) opinion, this reduces complexity (by not having to synchronize multiple copies) and increases security and privacy (by never storing all of the data in one place), among other things.

    In short, some key differentiating question I typically ask when investigating Cleversafe-competitive systems:

    Does the system...

    * ...store an entire file/data/content set (encrypted or otherwise) in one place?
    * ...make multiple copies of the data?

    If either answer is yes, then I tend to view the project as significantly different then the Cleversafe technology. I have found full-replication-based methods in many various forms are quite prevelant in many applications.

    -Matt

  80. The meta-data system can use any IDA by mengland · · Score: 1

    Taking an excerpt from a previous post I made on another sub-thread:

    I felt it worth noting at the top level of this thread:

    The Cleversafe meta-data system was designed with an attempt to be able to easily use any information-dispersal algorithm (IDA) available today (including the Reed-Solomon, Shamir, and current Cleversafe methods) or in the future. In fact, the current Cleversafe IDA represents a small part of the code; the vast majority of the code and development effort can be found in the meta-data-management system to track data slices from an unlimited number of files originating from an unlimited number of users and computing systems; this also needs to be done in a way such that the entire system can tolerate and tremendous number of concurrent failures from the underlying system components.

    Is Cleversafe the first one to design a "hyper-redundant," grid-like, meta-data-management system? No. However, we believe we are the only ones to have built such a robust system based on an IDA mechanism with absolutely no replication of the data--and therefore, we contend, much less complexity. Further, I believe that this reduced complexity (when compared with other distributed file/meta-data systems) enables many powerful features, including performance scalability and better human serviceability.

    Will the Cleversafe system prove to be uniquely valuable? We believe so. However, as at least one other post on this thread mentions: time will tell.

    It's also important for me to reiterate: I personally designed much of the current meta-data system, so I present an obviously-biased perspective.

    -Matt

  81. Dispersal is not encryption; Cleversafe uses both by mengland · · Score: 1

    Anonymous writer writes:
    Recovery volumes for various archival utilities have been around a long time. This is just the first time that I know of where they use the RSA algorithm instead of an older algorithm.

    To be clear:

    Dispersal is not encryption. (Cleversafe uses both.)

    While we (Cleversafe) do use public-private key methods to encrypt the data/content, this is still a separate operation from the data *dispersal*.

    Moreover, if the content encryption is somehow cracked/broken (and public-private key encryption can be broken), the cracker acquires at most 1/11th (in our current IDA scheme) of "scrambled"/non-contiguous data.

    This is the major reason why we feel that our system provides unique, security-and-privacy-based value over encryption-only based systems. If the encryption breaks, you still can't get the data. (And of course, we use the encryption mechanism, too.)

    Note that a different RSA key can be used to encrypt each file Slice (ie, for each Cleversafe "Pillar," as per our terminology for our grid design) such that if a cracker breaks one slice/Pillar's key, they still have to break the key for other Pillars (and there are 11 total Pillars in the current IDA scheme)...*in addition* to the "toplevel" key we use to encrypt the file before it's sliced/dispersed. Note: we don't have this post-dispersed-encryption feature in our current alpha4.1.3 code (we only encrypt the toplevel file before it's compressed and dispersed), but we believe it will not be difficult to add.

    Also: we will be signing each slice as well, for data-integrity purposes to prevent both malicious and non-malicious data change/vandalism. This also will be a feature added in the near term.

    One can read more about the open-source flavor of the Cleversafe grid design.

    -Matt

    ps: I encourage interested parties to continue discussions at http://forums.cleversafe.org/ (as well as to soon-to-be-available email lists that will synchronized with these forums).

  82. Re: Clifford Cocks by Anonymous Coward · · Score: 0

    Heh heh. You said "Cocks". Heh heh.

    Yours truely,
    Binkus or Buttwipe (I'm not sure witch)