Ask Slashdot: Linux Mountable Storage Pool For All the Cloud Systems?

← Back to Stories (view on slashdot.org)

Ask Slashdot: Linux Mountable Storage Pool For All the Cloud Systems?

Posted by samzenpus on Sunday January 13, 2013 @10:13AM from the one-raid-to-bind-them-all dept.

An anonymous reader writes "Many cloud systems are available on the market like: dropbox, google, sugar sync, or your local internet provider, that offer some free gigabytes of storage. Is there anything out there which can combine the storage into one usable folder (preferably linux mountable) and encrypt the data stored in the cloud? The basic idea would be to create one file per cloud used as a block device. Then combine all of them using a software raid (redundancy etc) with cryptFS on top. Have you heard of anything which can do that or what can be used to build upon?"

10 of 165 comments (clear)

Min score:

Reason:

Sort:

Re:There are several options here by Omnifarious · 2013-01-13 10:16 · Score: 4, Interesting

BTW, doing this at a block device level is likely a very poor idea. Block devices are very difficult to get right in a distributed fashion from a synchronization standpoint. They also are likely to cause a lot of excess network traffic since the units the system deals with are poorly matched to the logical units that are actually modified. A good distributed solution to this problem will at have to know something about the fact that you have individual files to be at all reasonable to use.

--
Need a Python, C++, Unix, Linux develop
Why do you want to combine them? by egcagrac0 · 2013-01-13 10:26 · Score: 4, Interesting

If you don't trust the provider to keep your data intact, don't use that provider.
If you need more storage, pay for it. The cost is not prohibitive - 100GB or so for under US$10/mo is pretty easy to find.
If $10/month prices you out of the market, there are better things to worry about than encrypting files and storing them in the cloud.
1. Re:Why do you want to combine them? by silentcoder · 2013-01-14 00:47 · Score: 3, Interesting
  
  I'll try to answer as well. My previous DSLR a Canon 400D couldn't do Raw+Jpeg so I used ONLY raw, for things like "holiday snaps" style shooting I'd just mass-export to jpeg, but for real work I'd always use the RAW.
  With my new 40D I use Raw+Jpeg for shooting but I'm tempted to go to pure RAW as I've yet to use the jpeg, I figured it may be useful for a reference (what the camera thought was there) but otherwise, no thanks.
  1) 1) For a reasonably well-exposed photo where the white balance is roughly correct in the camera, are you able to produce a significantly better end result from RAW than from JPEG?
  For me the first part of post-processing is playing with the RAW - for example sometimes I will deliberately switch it to a different white balance or even do manual white balance to achieve some or other artistic effect. Raw is also very powerful for adjusting things like the global saturation and contrast levels very finely (while you'll want a tool like photoshop or gimp to adjust individual elements).
  >2) Do you have any rough idea about the bit depth the RAW photos need to be at before you get a significant advantage over JPEG? My old camera produced 10 bit RAWs, and at that time I was almost never able to out-perform the JPEG. My new camera has 12 bit RAW, and I haven't really had much time recently (small children here as well) to play around with RAW. But maybe it would be worth it?
  It doesn't much matter. If you are taking snapshots then just use jpeg. RAW comes into it's own if you're doing real photography - product shoots, studio work, landscape work, art photography etc. - where the post-production is as important a part of the process as the taking of the shot. RAW is stage one of producing the perfect image, gimp/photoshop is stage 2. Even those photographers who eschew editing of pictures will usually do RAW adjustment - which doesn't change what's there, only how it's 'presented' in terms of light.
  Personally I point out to those types that there is nothing I can do in gimp/photoshop that the old boys didn't use to do in the dark-room, it's just faster, easer and a LOT cheaper.
  For the most part a human cannot on a computer screen tell the difference between a 6MP camera (the smallest DSLR I know off) and an 18MP one since no common desktop/laptop screen could show such a picture full-size anyway you're seeing a distorted/shrunken version to begin with, but where it DOES matter is prints. I do prints of my best work and some have also been printed in magazines like Marie Claire and when you're doing prints you need to provide the images in the right level. Generally you will want to ensure they are scaled to page size (e.g. A3 for example) yourself - and that means including white-space bordering to prevent stretching - and you'll need to ensure they are high print-resolution (professional printing should be 300 DPI). Format wise uncompressed jpeg is usually used.
  Simple reality is that to get an uncompressed jpeg at 300DPI that is A3 in size you need a high MP shot to begin with or your picture simply won't look good at that resolution.
  RAW is invaluable here as it lets you handle such things as exposure levels much better. You cannot just yank up the exposure of a picture - if you do that you create lots of digital noise (which shows up as red-speckle) which no amount of editing can ever REALLY cover up properly - but in RAW you can subtly adjust lighting to make a useful picture from a slightly underexposed shot sometimes anyway. On 800x600 web-quality jpegs you'll never even NOTICE the noise being created in typical "push up the exposure" steps - but if you print that as an A3 poster for framing every one of those red dots is a glaring monstrosity.
  The first and finest art of ALL photography is lighting, don't think you can fix bad lighting in post, at best you can maybe make a useful website picture. If you are trying to do anything that's printable - you need to get your light right. The purpose of editing (both RAW and gimp) is to modify a
  
  --
  Unicode killed the ASCII-art *
Don't trust the cloud by Anonymous Coward · 2013-01-13 10:33 · Score: 5, Interesting

My residential internet connection via Comcast is fast enough today that I can pull files off of my server at home, "cloud" style.
I have two 2TB drives in RAID1, encrypted with whatever magic `cryptsetup' performs, with port 22 of my firewall forwarded to the server. SSH only accepts logins from me. I consider my data to be more secure and easier to access (it's literally seconds away from availability on any real operating system anywhere with internet access. Windows need not apply) than anything I could get from ZOMG TEH CLOUD. Only disadvantage is speed. I'm not gonna be shunting gigabyte plus files around like this.
Added bonus: easy to add users, easy to throw up a web interface, can do whatever you want with it, since you own the hardware (!!)
Pfft, cloud. I remember when it was called 'the internet'.
Now get the fuck off my lawn.
1. Re:Don't trust the cloud by Blaskowicz · 2013-01-13 11:31 · Score: 3, Interesting
  
  btw there's sshfs on Windows, I thought it would be pedantic to mention it but it exists albeit a bit slow.
2. Re:Don't trust the cloud by Hatta · 2013-01-14 02:32 · Score: 3, Interesting
  
  The overwhelming majority of Windows applications can be configured using a series of dialog boxes, typically either in the "tools->options" or "edit->preferences" menu. These applications may incidentally store the results of those dialog boxes in a registry hive (or in an ini file in the %appdata% folder or similar), but it's infrequently the only way to make such changes. With Apache, they don't give you a tabbed, categorized dialog box in which to manipulate the options
  No, they give you a nice organized text file to edit, with descriptive comments. You can search it and you can back it up easily. That's even BETTER than a tree full of checkboxes.
  
  --
  Give me Classic Slashdot or give me death!
Can be done with a FTPfs, raid and encfs by devitto · 2013-01-13 10:52 · Score: 4, Interesting

Someone's already done & blogged about this, using multiple free FTP accounts, with a FTPfs bringing them local, then mounting a RAID (mirrored & parity) partition over it, and encfs over the top of that.
It was VERY SLOW, but did work, even when he blocked access to some of the FTP accounts - it was just seen as a failed drive read, and the parity reconstruction still permitted access.
I think the key problem was that FTP servers he used (or the FTPfs driver) didn't allow for partial writes to files, so every time you changed something, large amounts of data was re-uploaded. So there were possibilities for optimization.....
Enjoy & share if you get anywhere !
Dom
Bitcasa by Anonymous Coward · 2013-01-13 11:45 · Score: 2, Interesting

Bitcasa is an encrypted block based filesystem which mounts via FUSE and streams to the cloud behind the scenes. Has really intelligent caching built in and works with all major platforms (Lin, Win, Mac).
Linux client hasn't been updated as much as the other platforms but should catch up soon.
Full disclosure- I'm the CEO of Bitcasa.
Re:There are several options here by ultrasawblade · 2013-01-13 12:01 · Score: 4, Interesting

If you can mount a cloud service as a folder in Linux somehow, then Tahoe-LAFS can work. I know Dropbox lets you do this but am unsure about the other systems. If the cloud service allows upload/download via HTTPS, this could be worked around nontrivially by writing something using FUSE to translate filesystem requests to HTTPS requests recognized by that service.
You would have to have a "client" running for each cloud service. Each client has a storage directory which needs to be configured to be the same as the local sync directory for the cloud service. While Tahoe-LAFS is intended to have each client in a "grid" run on separate machines, there's no reason why multiple clients on the same grid could not be running locally. You'd just have to edit configs manually, setting the IP address to 127.0.0.1 and choosing a different port for each "client", and also making sure the introducer.furl is set accordingly.
Tahoe-LAFS's capability system is pretty neat. Clients never see unencrypted data and you can configure the redundancy and "spread-outness" of the data however you like. Tahoe-LAFS's propensity to disallow quick "deleting" of shares also works well with possibly slowly updating cloud backends - Tahoe is designed to prefer to "age out" shares containing old files periodically rather than support direct deleting.
And Tahoe works as well on Windows as it does on Linux (it's a python script) so if your cloud service is Windows only that is no disadvantage.
Re:There are several options here by Omnifarious · 2013-01-13 13:44 · Score: 5, Interesting

Tahoe sort of achieves this in an odd way. Directories contain hashes of the file they reference instead of an inode number. This means that a Tahoe node often doesn't even know who a file really belongs to, even though it knows its length.
The main issue with block storage is this...
Suppose you modify a data section of a file in a btrfs filesystem mounted on some kind of weird encrypted block device. There will be a whole tree of blocks that get modified, all the way up to the root node. All of these blocks have to be written before the root block is, and for a small file there will be several more blocks that need updating than there are data blocks on the file.
These two issues create a big synchronization problem and a lot of extra traffic.
In contrast, a good distributed filesystem protocol that's aware of individual files can send a single message that contains some kind of identifier for the file, and the new data it should contain. This message will often be smaller than a single filesystem block, and it will also usually be compressed before it gets on the wire. Much more efficient and while there are synchronization issues between updates to individual files, within a file there aren't any.

--
Need a Python, C++, Unix, Linux develop