Ask Slashdot: User-Friendly, Version-Preserving File Sharing For Linux?
petherfile writes: I've been a professional with Microsoft stuff for more than 10 years and I'm a bit sick of it to be honest. The one that's got me stuck is really not where I expected it to be. You can use a combination of DFS and VSS to create a file share where users can put whatever files they are working on that is both redundant and has "previous versions" of files they can recover. That is, users have a highly available network location where they can "go back" to how their file was an hour ago. How do you do that with Linux?
This is a highly desirable situation for users. I know there are nice document management things out there that make sharepoint look silly, but I just want a simple file share, not a document management utility. I've found versioning file systems for Linux that do what Microsoft does with VSS so much better (for having previous version of files available.) I've found distributed file systems for Linux that make DFS look like a bad joke. Unfortunately, they seem to be mutually exclusive. Is there something simple I have missed?
This is a highly desirable situation for users. I know there are nice document management things out there that make sharepoint look silly, but I just want a simple file share, not a document management utility. I've found versioning file systems for Linux that do what Microsoft does with VSS so much better (for having previous version of files available.) I've found distributed file systems for Linux that make DFS look like a bad joke. Unfortunately, they seem to be mutually exclusive. Is there something simple I have missed?
Wasn't Windows NT based upon VMS...
FreeNAS will do all of that shiny stuff. And snapshots, too.
Like the submitter, can someone do my job as well?
Flash drives.
Look at ZFS. Supports snapshots, SMB and NFS. And you can show the snapshots as a read-only directory to users.
If you're just looking for a flexible CM system. It's multisite capable, and I think licensing is free.
use VMS, it's built in
TOPS-20 had it too
yearning already for the lost technology of the 1970s
This is a stackoverflow question. http://stackoverflow.com/
https://owncloud.org/
Take a look at the features list at https://owncloud.org/features/. It seems to have what you want. I played with it a couple of years ago and it was easy to set up then. Unfortunately I never had the opportunity to try it in production.
What is this of which you speak? Can someone expand on these thing?
Support my political activism on Patreon.
Search Google for WebDAV auto-versioning.
I have set up (and used for many years) a WebDAV file share served by Apache (with an SVN backend). It can be used as an SVN repository (with checkin comments, etc.) or used as a simple remote file share that automatically creates revisions for the changes. I have used various WebDAV clients (built in to Linux, Mac, Windows) to access and modify the contents of the files.
Hope that gives you another area to explore.
Samba 4 will integrate nicely with btrfs and do previous versions for you. To get redundancy, just put the btrfs volume on RAID, perhaps?
Score:-1, Wrong
You can try samba with a btrfs plugin
put the versioning file system on top of the distributed file system.
Or, in a way even a dice employee will understand:
You: I like sucking dicks. And I live having my dick sucked. But nobody will suck my dick and nobody will let me suck their dick.
Me: Ok, why don't you suck your own dick?
You: (run off to your bedroom, close the door, is not seen again for 3 days).
Copyright (c) 1990 - 2014 Dice. All rights reserved. Use of this comment is subject to certain Terms and Conditions.
make sharepoint look silly
Sharepoint needs absolutely zero help to look silly.
Of MS world of products, sharepoint is perhaps the worst festering thing they got.
XML is like violence. If it doesn't solve the problem, use more.
Yuo can try Sparkleshare. This soft use git like repository to keep versions
If someone updates a file in place, do you really want to create a new version for every write call? On the other hand, apps that update files atomically do so by renaming original and backup, which breaks tracking these as the same file.
What you can do is make hourly snapshots and make them available as read only shared directories. Easy enough with simple hard links, and many filesystems support snapshots natively.
Protocols like WebDav do support versioning, but it would work best with WebDav clients, not naive apps that think they are writing to a local disk.
The best version control is actual version control such as git.
I haven't tried the versioning yet, but Syncthing (open-source file sync with nice web interface, cross-platform) supports it AFAIK.
syncthing.net
I use Sparkleshare. Git is on back to keep viresions.
A fundamental problem with your question is what you expect for versions. Filesystem interfaces are not transactional and applications and users do not provide clear indications of what constitutes a new "version" of a file. Is it every byte change, like a document in Google docs or change-tracking within an MS Office document? Is it whatever is found every night when a backup system makes a pass over the file tree? Is it something in between?
People are mentioning things like BTRFS and ZFS snapshots but they do not solve this problem any more than a recurring backup tool. Some process needs to determine WHEN to take a snapshot, and from concurrency this can include partially modified files except where applications have followed a protocol such as creating a modified copy and then renaming it. The filesystem snapshots only change the cost model a bit for taking frequent snapshots, both in terms of the processing and IO time and in the resulting storage consumption to hold multiple snapshots.
Tools like Dropbox apply a bunch of heuristics to attempt to identify points in time when a new file version exists, and to capture those versions. This is done with an agent running on the same machine as the applications which modify the files, so it can monitor things like file locks, open file descriptors, idiomatic file rename events, etc. A lot of this is impossible from a centralized file server because the SMB or NFS protocol hides a lot of the access details that would distinguish a "complete" file change from a partial change.
There's this thing called VMS....
There is no God, and Dirac is his prophet.
I'm not suggesting you switch, but check out how the features of the ZFS filesystem have been integrated into the desktop filemanager (Time Slider feature) in Solaris:
http://java.dzone.com/news/killer-feature-opensolaris-200
http://www.serverwatch.com/tutorials/article.php/3831881/Say-Cheese-OpenSolaris-Time-Slider.htm
http://www.oracle.com/technetwork/articles/servers-storage-dev/autosnapshots-397145.html
If someone updates a file in place, do you really want to create a new version for every write call?
This is precisely how VMS does it, it works great. You can control how many generations it keeps. You can manually delete older versions if you want. You can explicitly refer to the older versions in the path if you want. If, for instance, you are creating a database file, you can disable the versioning.
Wow, mainstream 1970s technology is just way too advanced for this crowd.
Indeed, sometimes these questions are a bit weird. "Hey guys, I'm using Windows and everything works fine, but I'd like to use Linux because."
..Why?
It's not 2001 anymore - your jobs shouldn't be at risk if you are doing them correctly.
Why not simply compare and contrast, dig deep and think. Compromise your sensibilities and explain to the OP how YOU would best achieve the following:
A real time replicated block file system for the purposes of users saving and sharing files to offer some form of business continuity when things fail
^ With the ability for a user to restore a file from 1 hour previous. This file may have been a test file, a Bloomberg stream file or just a data entry clerk doing it wrong for 30 minutes.
How? - Give examples of client software, eg: Calc, Desktop OS eg: MINT and whatever backend solution you desire.
Don't just be a troll or a cunt. Make some news.
Why not a revision control system like SVN or git?
RTFA:
" I just want a simple file share"
http://seafile.com/en/home/
This is what you want. Anything using rsync rules.
because Linux is stable, requires much less horsepower to do more than a Windows box....blah blah blah. Ya. It's usually something like that.
"I want document management, but not a document management utility."
Don't fight it. Use the right tools for the job.
I do not fail; I succeed at finding out what does not work.
And it still is trash-talk, as Linux is neither of those things. It's currently the most unstable and slow operating system.
rsync with the --compare-dest option will give changed files, and --link-dest will give whole file trees at set times.
You can do it pretty simple, or quite complicated, depending on your needs and preferences.
rsync --link-dest makes a new tree with the current time, using only enough space for the directory tree and the changed files. On my box, I use it in a cron that runs every 5 minutes and cycles through my backup list. If any of them are older than the interval, it fires off the backup script specific to that type of connection (local LVM, nfs, CIFS/SMB, ssh, etc).
A second cron then prunes those directories so that I've got fewer copies as I go back in time. An example would be pulling a copy every 15 minutes and keep every copy for 2 weeks, keep one from each hour for a month, one per day for a year, one per month for 10 years, and one per year forever.
This can be easily adapted to other schemes. --compare-dest will make a tree with just the changed files, which you can then gather up and sort into the archival tree. Run a second (plain) rsync to sync up the comparison directory when done.
See that "Preview" button?
In a past life, I setup the free version of Alfresco for my teams. Configuration can be tough for those who don't like insanely deep trees of config files but it has a nice webdav server which integrates with the rest of it's quite awesome versioning capabilities. It's great for versioning anything that isn't source code.
As a friend said back in the early 1980s: "The best that can be said about Unix is that it has only put operating system development back 20 years".
Yes, that's what the OP said he or she wanted. But is it really what's needed for solving the problem at hand?
Linux, in good Unix tradition, is designed to build things in pieces, which can work together. A number of Linux filesystems are designed to be layered on top of other filesystems. One such filesystem is the distributed, fault-tolerant Glusterfs (http://www.gluster.org/). You should be able to layer it on top of a number of other filesystems which can do versioning or snapshots (e.g. ZFS or Btrfs).
Life's a lot like money-- you spend it, then it's gone. Spend wisely.
That's why large scale providers (ie. Google, et al.) Use Linux over Windows, right? Get a clue.
Apache can serve webdav shares. It can back them with SVN.
WebDAV can be mounted as a FS by all major OSs, and SVN also has a web based history. You need to provide some auth system to hook into (obviously), but otherwise it works really very well.
SJW n. One who posts facts.
but its horrible
Answer: No.
Nah, it's free and hackable. At Google or Facebook scale you have machine hardware failing at rates where a few software crashes on top do not matter. You have to deal with any machine going down at any random time anyway. Because they do. Regularly (several machones per day)
I used to use SVN as a mountable WebDAV target and then stash all my files there. SVN took care of auto-revisioning. I just went into the web interface I had for SVN in order to pull out the version I wanted from the past.
I've moved from SVN to git and never bothered to set this back up, but I suspect some similar WebDAV to git route is also available.
Different users find different things friendly. I want options and configurability. Grandma Culture20 wants one button and a wizard.
No. GNU/systemd is the unstable operating system. Linux is just the kernel.
Contrary to popular rumors, there are a number of ways to do what you want. I can't vouch for all of these combinations working and wouldn't be too optimistic about tackling some of them. The more advanced stuff can take quite a while to ramp up to speed.
If you don't mind FUSE as an intermediary, there's gitfs that uses git as a file system (which is kind of is anyway, beyond being just a VCS). It creates a new version on every file close. You can point it to a git remote on the same machine or across a network which lives on any filesystem.
You already found that there are some non-mainline kernel modules for filesystems like next3, ext3cow, or tux3 that do versioning on write. NILFS is actually in the kernel these days (since 2.6.something) . More information about NILFS2 shows that it's somewhat slow but that it is in fact a stable, dependable file system.
Subversion has a feature that you can put WebDAV in front of it, mount the WebDAV as a filesystem somewhere, and every write creates a new revision of the file in SVN. That gets you networked and versioned. This works similarly to gitfs but uses WebDAV. You could if you wanted use dav2fs in front of that to treat it like a normal file system again.
You can then share any of these over SMB with Samba. Or you can shared them via NFS.
If you need really high-end, fast, replicated network filesystems you can use any of the clustered storage systems that will use a storage node's underlying files with any of these below that, but that will put your revisions underneath everything else rather than on top. Then there's using something like gitfs with the remote on top of, for example, DRDB, XtreemFS, or Ceph (for example even across CephFS which presents Ceph as a normal POSIX filesystem). This latter option puts your revisions closer to the user and then each revision gets replicated.
I've personally never used some of the more exotic combinations listed here. You could in theory put NILFS2 on LVM with DRBD as the physical layer (since DRBD supports that) and then serve that file system via Samba (CIFS) or NFS which I would expect to work well enough if slowly.
Simple way with sftp / fuse network file share, plus rsnapshot. Obviously need to tweek for performance and preferences, but good for small networks and quick and dirty shares. Ssh just works, and provides encryption along with lots of methods to access.
Going for the dumbest comment today... I really miss OpenVMS and files with version numbers. That is all.
You are precisely the reason "ask slashdot" sucks.
"Sure - I know that's what he *asked* for - but since I don't know that I"ll answer another question which I can or berate the questioner for wanting something I can't provide."
WIth either LVM snapshots or ZFS snapshots, you just need to use Samba's vfs_shadow_copy2 module and you get exactly that "Previous Versions" functionality you're wanting for Windows users accessing your share. We're on FreeBSD/ZFS using Samba with that module, and users can restore previous versions when and if the need arises.
Glusterfs trash feature is able to recover previous file versions.
What happens if someone opens a file and changes a single 80-byte line inside, one byte at a time?
Once you have selected a versioning file system, it is easy to add DFS functionality to it at the block layer. The biggest factor will depend on the latency and bandwidth between DFS nodes but some things to look at include:
Distributed Replicated Block Device (DRBD)
Meta-Device RAID 1 mirror across a physical and Network Block Device (NBD)
Or Ceph's RADOS Block Device (RBD)
What YOU are talking about is what "undo" functions of programs are for. It would be irrational to develop an entire network architecture from clients to servers and storage that is geared to provide infinite storage for infinite granularity of infinite versions of files where 99.99999% of the intermediate versions (or 79 of the 80 changes in your example) will never be wanted of needed. At some point, human users need to know what they want and execute their actions with at least the level of responsibility and intelligence of a chimpanzee.
Any proper solution should [1] allow humans to decide that the data they have at a particular moment is significant and deliberately tag it as a version they can return to and/or [2] auto-tag-and-store versions at specific intervals that the human users are trained to be aware of. In most situations, institutions will find that the costs and hassles of very-fine-granularity versioning are higher than the losses incurred when on rare occasions a user screws up and has to go back to an archive from perhaps an hour or two earlier and re-do the hour or two of lost work.
As is often the case with technology, there IS no right answer.... there is a requirement for an engineer to get to the specific requirements of a specific situation and intelligently select an imperfect but best-fit solution that hits the right trade-offs between cost, convenience, safety, etc.
Holy shit, you are one phenomenally moronic troll. Seriously, you can't even pull off a simple trolling attempt without broadcasting to the world what a complete fucking idiot you are. Amazing!
Everyone works with their files locally, changes are synced via a common server. Everyone has a compressed backup of the complete history of the entire filesystem for disaster recovery. Everyone should be able to browse and recover any version of any file without adding load to the server, though usability might be slightly lacking. You could also setup a FUSE filesystem on a linux box to browse the history.
You may need to partition the file storage into multiple repositories, so that people don't need to synchronise folders that they don't use.
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
While most of the nifty features are linked to the paying version Seafile seems quite cool, but I see basic users like me cannot install it as a php script on a shared server
I'm still using Owncloud, even not the last version, for daily syncs of a dozen machines from desktops to phones and tablets, and it has "just worked" for a couple of years now. Its only downpoint is, contrary to Dropbox no local-loop transfers are allowed (everything must transit through the server). This is related to the way Owncloud handles versioning; which is heavily discussed in their forums (https://github.com/owncloud/client/issues/230), but if this feature does not bother you I see no other drawbacks.
Herve S.
Samba can do that for you with VFS (Virtual File System).
https://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/VFS.html#id2651694
Recycle module might be enough for you:
https://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/VFS.html#id2651247
But I think you are after shadow_copy module:
https://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/VFS.html#id2651694
If you need it on large scale deployments then I would suggest looking at Samba cluster or GlusterFS integration.
In both cases you will need high throughput and low latency network.
If you plan to use it on gigabit ethernet with Intel cards, please change network card buffers (ethtool) and double them,
if possible, this should increase IOps you can push through the network.
#! /bin/bash
# stop on errors
set -e
scriptname=$(basename $0)
pidfile="/var/run/${scriptname}"
# lock it
exec 200>$pidfile
flock -n 200 || exit 1
pid=$$
echo $pid 1>&200
# Rest of code
SRC=$1
SNAPSHOTS=$SRC/.snapshots
MOST_RECENT="$SNAPSHOTS/1 hour ago"
RSYNC="/root/rsync-HEAD-20130616-0452GMT/rsync"
RSYNC_OPTIONS="-ah --no-inc-recursive"
LINK_DEST=""
FILTER=""
if [ -d "$MOST_RECENT" ]; then .snapshots-filter"
LINK_DEST="--link-dest=\"$MOST_RECENT\""
fi
if [ -f "$SRC/.snapshots-filter" ]; then
FILTER="--filter='merge
fi
CMD="$RSYNC $RSYNC_OPTIONS $FILTER --exclude=.git --exclude=.snapshots $LINK_DEST \"$SRC/\" \"$SNAPSHOTS/.temp/\""
mkdir -p "$SNAPSHOTS/.temp"
eval $CMD
touch "$SNAPSHOTS/.temp"
#
function moveBackup {
if [ -d "$1" ]; then
if [ $(((`date +%s` - `stat --format %Y "$1"`) / 60)) -ge $4 ]; then
if [ -d "$2" ]; then
if [ ! -d "$3" ]; then
mv "$2" "$3"
else
rm -rf "$2"
fi
fi
mv "$1" "$2"
fi
fi
}
moveBackup "$SNAPSHOTS/around 3 months ago" "$SNAPSHOTS/around 6 months ago" "$SNAPSHOTS/around 9 months ago" 259110
moveBackup "$SNAPSHOTS/around 1 month ago" "$SNAPSHOTS/around 2 months ago" "$SNAPSHOTS/around 3 months ago" 86310
moveBackup "$SNAPSHOTS/about 15 days ago" "$SNAPSHOTS/about 23 days ago" "$SNAPSHOTS/around 1 month ago" 33030
moveBackup "$SNAPSHOTS/about 7 days ago" "$SNAPSHOTS/about 11 days ago" "$SNAPSHOTS/about 15 days ago" 15750
moveBackup "$SNAPSHOTS/about 3 days ago" "$SNAPSHOTS/about 5 days ago" "$SNAPSHOTS/about 7 days ago" 7110
moveBackup "$SNAPSHOTS/24 hours ago" "$SNAPSHOTS/about 2 days ago" "$SNAPSHOTS/about 3 days ago" 2790
moveBackup "$SNAPSHOTS/8 hours ago" "$SNAPSHOTS/16 hours ago" "$SNAPSHOTS/24 hours ago" 870
moveBackup "$SNAPSHOTS/4 hours ago" "$SNAPSHOTS/6 hours ago" "$SNAPSHOTS/8 hours ago" 270
moveBackup "$SNAPSHOTS/2 hours ago" "$SNAPSHOTS/3 hours ago" "$SNAPSHOTS/4 hours ago" 90
if [ -d "$SNAPSHOTS/1 hour ago" ] ; then
mv "$SNAPSHOTS/1 hour ago" "$SNAPSHOTS/2 hours ago"
fi
mv "$SNAPSHOTS/.temp" "$MOST_RECENT"
chown --reference="$SRC" "$SNAPSHOTS"
chown --reference="$SRC" "$MOST_RECENT"
Software failures will scale up similarly. If you propose for example that, on any single PC, a Linux crash is 10x more likely than a hardware failure, then they're be dealing with dozens of crashes per day - and that would have to be some pretty stable software. What's your crash to hardware-failure ratio?
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Potentially yes. You might throw some of those away but...
And then the user on a file independent basis needs to know when bunches of changes happened. So for example file X had:
Large number of changes between April 2015 and May 2015
large number of changes between Nov 2014 and Jan 2015
large number of changes between Sep 2014 and Nov 2013
etc... with no changes in between. Other files are going to have totally different bursts of activity.
Open, change, close is a version.
Open, change is not a version since it didn't get closed.
The versioning pattern can keep older it doesn't have to be just "last 10". On better versioning system it can be:
Last 10, up to 1 per month for 12 months. 1 per 6 mo forever. See Google Docs or Wikipedia for good examples of this.
You might want to have a look at Deja Dup
What is the performance of this sort of thing like? I'm thinking of the case of a business using this constantly. Is it going to work for a few hundred users using a share daily? Or is that just going to make it die? I was thinking something that was a copy-on-write type thing, not a slow cron job type thing, but if it does perform OK...
So here we had a few admins and a bunch of 'normal' users. The normal users needed an admin to create a new group to facilitate sharing. With seafile, the users could create their own groups. That and frankly we hit a few bugs with sync and seafile seemed to do better.
owncloud's document preview and the plugins were a bit worse than seafile's baked in, but primarily it's just a platform for replicating and sharing file content for us, we don't really care about anything beyond that. We don't use the commercial seafile.
XML is like violence. If it doesn't solve the problem, use more.
It does everything without crashing like WinDOS does.
Correct, it just crashes different. e.g.: my fiancée's MacBook likes to crash the kernel when it tries to wake from sleep to check the status of the Magic Mouse.