Slashdot Mirror


Ask Slashdot: Linux Mountable Storage Pool For All the Cloud Systems?

An anonymous reader writes "Many cloud systems are available on the market like: dropbox, google, sugar sync, or your local internet provider, that offer some free gigabytes of storage. Is there anything out there which can combine the storage into one usable folder (preferably linux mountable) and encrypt the data stored in the cloud? The basic idea would be to create one file per cloud used as a block device. Then combine all of them using a software raid (redundancy etc) with cryptFS on top. Have you heard of anything which can do that or what can be used to build upon?"

16 of 165 comments (clear)

  1. There are several options here by Omnifarious · · Score: 5, Informative

    The first, and most interesting, is Tahoe LAFS. It does come with a FUSE driver, so it can be mounted like a regular filesystem. It is cloud-based and redundant to a degree you choose yourself. All copies stored are encrypted, so the only person who can read them is you. I'm not sure though if fetching from more nodes than you strictly need to reconstruct your original file actually buys you anything with that system, but I think it does.

    You could also use something like a mountable version of Google Drive and then layer fuse-encfs on top of it. That's not quite as secure as encrypting at the block layer. The overall shape of your directory hierarchy is available, even if the individual file names and their contents are obscured. That should probably be good enough for most purposes.

    1. Re:There are several options here by Omnifarious · · Score: 4, Interesting

      BTW, doing this at a block device level is likely a very poor idea. Block devices are very difficult to get right in a distributed fashion from a synchronization standpoint. They also are likely to cause a lot of excess network traffic since the units the system deals with are poorly matched to the logical units that are actually modified. A good distributed solution to this problem will at have to know something about the fact that you have individual files to be at all reasonable to use.

    2. Re:There are several options here by ultrasawblade · · Score: 4, Interesting

      If you can mount a cloud service as a folder in Linux somehow, then Tahoe-LAFS can work. I know Dropbox lets you do this but am unsure about the other systems. If the cloud service allows upload/download via HTTPS, this could be worked around nontrivially by writing something using FUSE to translate filesystem requests to HTTPS requests recognized by that service.

      You would have to have a "client" running for each cloud service. Each client has a storage directory which needs to be configured to be the same as the local sync directory for the cloud service. While Tahoe-LAFS is intended to have each client in a "grid" run on separate machines, there's no reason why multiple clients on the same grid could not be running locally. You'd just have to edit configs manually, setting the IP address to 127.0.0.1 and choosing a different port for each "client", and also making sure the introducer.furl is set accordingly.

      Tahoe-LAFS's capability system is pretty neat. Clients never see unencrypted data and you can configure the redundancy and "spread-outness" of the data however you like. Tahoe-LAFS's propensity to disallow quick "deleting" of shares also works well with possibly slowly updating cloud backends - Tahoe is designed to prefer to "age out" shares containing old files periodically rather than support direct deleting.

      And Tahoe works as well on Windows as it does on Linux (it's a python script) so if your cloud service is Windows only that is no disadvantage.

    3. Re:There are several options here by fuzzyfuzzyfungus · · Score: 4, Insightful

      I get the impression that, while Tahoe LAFS is the good option, the submitter of TFS is looking for the super-cheap option. He wants some sort of terrifying 'RAID-0-over-a-handful-of-different-interfaces-to-a-half-dozen-free-services-so-I-can-scrape-together-a-couple-gigs-here-and-a-couple-there' amalgamation. Unless he's planning some redundancy, that sounds like a recipe for data loss even if it were simple to set up, and you'd still be looking at a relatively paltry amount of storage space.

      It sounds to me like the submitter needs to decide whether he wants to step up and pay for some actual hosts(for which Tahoe LAFS would probably be a good option), or one of the more paranoid dropbox-clones, or whether this is simply an exercise in cobbling scrap together because that can be amusing sometimes...

    4. Re:There are several options here by Omnifarious · · Score: 5, Interesting

      Tahoe sort of achieves this in an odd way. Directories contain hashes of the file they reference instead of an inode number. This means that a Tahoe node often doesn't even know who a file really belongs to, even though it knows its length.

      The main issue with block storage is this...

      Suppose you modify a data section of a file in a btrfs filesystem mounted on some kind of weird encrypted block device. There will be a whole tree of blocks that get modified, all the way up to the root node. All of these blocks have to be written before the root block is, and for a small file there will be several more blocks that need updating than there are data blocks on the file.

      These two issues create a big synchronization problem and a lot of extra traffic.

      In contrast, a good distributed filesystem protocol that's aware of individual files can send a single message that contains some kind of identifier for the file, and the new data it should contain. This message will often be smaller than a single filesystem block, and it will also usually be compressed before it gets on the wire. Much more efficient and while there are synchronization issues between updates to individual files, within a file there aren't any.

  2. Why do you want to combine them? by egcagrac0 · · Score: 4, Interesting

    If you don't trust the provider to keep your data intact, don't use that provider.

    If you need more storage, pay for it. The cost is not prohibitive - 100GB or so for under US$10/mo is pretty easy to find.

    If $10/month prices you out of the market, there are better things to worry about than encrypting files and storing them in the cloud.

    1. Re:Why do you want to combine them? by Gaygirlie · · Score: 4, Informative

      OK, just fess up - it's your pr0n collection, right? 1TB of images at a gargantuan 20MB apiece is over 50000 images; at a more reasonable 5MB that increases to 200k+. "Hobby photographer" my foot.

      You've clearly never heard of RAW-images. 20MB RAW-image is actually still on the smaller end of the scale.

    2. Re:Why do you want to combine them? by WalrusSlayer · · Score: 5, Informative

      Uh, methinks you haven't really used tool chains designed to maximize the value of RAW files. The camera's built-in processor does way the hell more stuff than just compress raw pixels into JPEG. White balance is a huge one, along with level curves, sharpening, and a bunch of other stuff. Much of it either one-way or very hard to unwind. And as others have pointed out, most RAW *is* compressed, just lossless.

      So yeah, you can fix white-balance in a JPEG, but it's way simpler and more accurate to set the white balance if the pixels haven't already been misbalanced in the first place. Ditto for exposure. Most tools that deal with processed JPEG's don't even have an exposure adjustment---quite often the same tool that does both file types will have an exposure slide if it's RAW but not if it's JPEG. Sure, you can futz with brightness, contrast, levels, gamma, etc to correct an under-exposed shot. But sliding over to +2/3 for a slight underexposure is one click and you're done.

      As a guy who has deep-drilled many a software engineering discipline in his 25 year career, and shot tens of thousands of frames as an amateur enthusiast, you can pull me out of the "photographers who don't understand the tools" pool thank you very much.

      I have gone back and forth between JPEG and RAW over the years. There have been periods where, with two small children, I simply didn't have time to invest in RAW processing. And I was pleased the neutrality of the DSLR's processing anyway. Other times I knew I was shooting in challenging conditions, and set the camera to RAW+JPEG as a safety net. I've rescued many a shot that way. Recently I've been putting mileage on Lightroom and can extract an immense improvement out of the RAW's that would take me 4x the time to do if they were JPEG, and probably not end up with the same result. I now have more time to invest and the payoff is real and significant.

    3. Re:Why do you want to combine them? by BlackPignouf · · Score: 4, Informative

      1) For a reasonably well-exposed photo where the white balance is roughly correct in the camera, are you able to produce a significantly better end result from RAW than from JPEG? (I definitely agree on using RAW+JPEG when you know exposure could be a problem)

      Short answer : No
      Longer answer : It depends on the light, the sensor, the image processor in camera and your RAW workflow.
      From personal experience, I'd say that Canon JPGs are pretty good out of camera, Nikon JPGs lack a bit of sharpening, and Fuji X sensors have very good JPGs that are still impossible to match with RAW+Lightroom.
      I use RAW as a safety net during events or weddings, so that if I get a picture with good expression, focus and composition but wrong exposure or WB, I can still save it and print it instead of having to delete it.
      RAW is also interesting for scenes with high dynamic range, such as landscapes or concert.

      Do you have any rough idea about the bit depth the RAW photos need to be at before you get a significant advantage over JPEG? My old camera produced 10 bit RAWs, and at that time I was almost never able to out-perform the JPEG. My new camera has 12 bit RAW, and I haven't really had much time recently (small children here as well) to play around with RAW. But maybe it would be worth it?

      I think it has more to do with dynamic range than with bit-depth. Just find a contrasty scene, take a RAW picture and try to retain details in both shadows and highlights with your RAW conversion software.
      http://www.dpreview.com/learn/?/Glossary/Digital_Imaging/dynamic_range_01.htm
      http://www.dpreview.com/learn/?/Glossary/Digital_Imaging/tonal_range_01.htm

  3. Don't trust the cloud by Anonymous Coward · · Score: 5, Interesting

    My residential internet connection via Comcast is fast enough today that I can pull files off of my server at home, "cloud" style.

    I have two 2TB drives in RAID1, encrypted with whatever magic `cryptsetup' performs, with port 22 of my firewall forwarded to the server. SSH only accepts logins from me. I consider my data to be more secure and easier to access (it's literally seconds away from availability on any real operating system anywhere with internet access. Windows need not apply) than anything I could get from ZOMG TEH CLOUD. Only disadvantage is speed. I'm not gonna be shunting gigabyte plus files around like this.

    Added bonus: easy to add users, easy to throw up a web interface, can do whatever you want with it, since you own the hardware (!!)

    Pfft, cloud. I remember when it was called 'the internet'.

    Now get the fuck off my lawn.

    1. Re:Don't trust the cloud by gripped · · Score: 4, Funny

      SSH only accepts logins from me.

      You hope

  4. Can be done with a FTPfs, raid and encfs by devitto · · Score: 4, Interesting

    Someone's already done & blogged about this, using multiple free FTP accounts, with a FTPfs bringing them local, then mounting a RAID (mirrored & parity) partition over it, and encfs over the top of that.

    It was VERY SLOW, but did work, even when he blocked access to some of the FTP accounts - it was just seen as a failed drive read, and the parity reconstruction still permitted access.
    I think the key problem was that FTP servers he used (or the FTPfs driver) didn't allow for partial writes to files, so every time you changed something, large amounts of data was re-uploaded. So there were possibilities for optimization.....

    Enjoy & share if you get anywhere !

    Dom

  5. Cloud Striping by lucm · · Score: 4, Funny

    Forget redundancy, just go with "RAIC-0": unleashing the true power of the Cloud by striping providers!

    --
    lucm, indeed.
  6. Why do this ? by Alain+Williams · · Score: 5, Insightful

    He has not said why he wants to do this, ie what problem he is trying to solve. Depending on the question the answer may be different. Does he want a cloud because:

    * data must be available from many places - ie over the Internet ?

    * data is to be safe from one place (ie home/office machine) blowing up and losing everything ?

    * fast access is needed from many places at once ?

    Please first answer these questions so that we may provide you with what you need rather than random solutions that may not be what you need.

    1. Re:Why do this ? by bazorg · · Score: 4, Insightful

      I'd wager that OP is more interested in using 5 free accounts supplying 10GB each than to pay a monthly rent for 50GB.

  7. OwnCloud? by RanceJustice · · Score: 4, Informative

    I too have been looking for a solution for "denyable-they-don't-have-the-encryption-key" secure, remote storage, back ups and the like. Platform independent and standards compliance is important; I don't want to get locked into a proprietary ecosystem Its even better if there's a nice GUI and usability that doesn't require guru-level knowledge to access, and pricing isn't insane. Thus far I've found a handful of tools that seem to be the best of their breeds - CrashPlan for instance allows encrypted, secure multi-site backups (your own PCs, friends PCs, their servers), unlimited bandwidth/storage space etc... but it is only meant for backups, not sharing or accessing the data frequently. SpiderOak is a fantastic Dropbox alternative, Linux-friendly (both GUI and CLI for those interested) and seems to be amongst the best of the "Cloud (tm)/ Dropbox" type file-hosting/sharing services. However, as the OP specifically notes that they are looking for a unified solution to bring most or all of those remote hosted/"Cloud" stuff under a single mantle, there seems to be one project that has that goal in mind - OwnCloud

    I've been watching OwnCloud (www.owncloud.org) since I heard of it, happy to see an open-source, standards-compliant, "installable on your own hardware as well as rented hosting etc.." universal, modular data storage/sync operation that can be totally under your own control. It has a ton of features, but most notable in this case is exactly what the OP wants: the ability to mount your Google Drive or Dropbox share and have your OwnCloud install interact with them. It looks to be a really promising project and I really hope that a lot of coding gurus join and take notice; if my skill was sufficient, I'd be looking to contribute. It is a relatively new platform and I am sure it will have some growing pains (ie. I do not know if it supports ALL "cloud drive" shares, for instance SpiderOak...), but it supports everything from a built in media player, Card/CalDAV, backups, LDAP, and seems to have amazing potential. I am told that Version 5.0 will be the next big leap forward in terms of polish. Check it out and those that can contribute, please do so. It seems the best option to have user-friendly, open source, secure "cloud" services without bolstering hegemony aspirations by Google, Microsoft, and many others.