Distributed Data Storage on a LAN?

NBD Does this by backtick · 2003-10-29 09:12 · Score: 5, Insightful

http://nbd.sourceforge.net/

"Network Block Device (TCP version)

What is it: With this thing compiled into your kernel, Linux can use a remote server as one of its block devices. Every time the client computer wants to read /dev/nd0, it will send a request to the server via TCP, which will reply with the data requested. This can be used for stations with low disk space (or even diskless - if you boot from floppy) to borrow disk space from other computers. Unlike NFS, it is possible to put any file system on it. But (also unlike NFS), if someone has mounted NBD read/write, you must assure that no one else will have it mounted.

Limitations:It is impossible to use NBD as root file system, as an user-land program is required to start (but you could get away with initrd; I never tried that). (Patches to change this are welcome.) It also allows you to run read-only block-device in user-land (making server and client physically the same computer, communicating using loopback). Please notice that read-write nbd with client and server on the same machine is bad idea: expect deadlock within seconds (this may vary between kernel versions, maybe on one sunny day it will be even safe?). More generally, it is bad idea to create loop in 'rw mounts graph'. I.e., if machineA is using device from machineB readwrite, it is bad idea to use device on machineB from machineA.

Read-write nbd with client and server on some machine has rather fundamental problem: when system is short of memory, it tries to write back dirty page. So nbd client asks nbd server to write back data, but as nbd-server is userland process, it may require memory to fullfill the request. That way lies the deadlock.

Current state: It currently works. Network block device seems to be pretty stable. I originaly thought that it is impossible to swap over TCP. It turned out not to be true - swapping over TCP now works and seems to be deadlock-free.

If you want swapping to work, first make nbd working. (You'll have to mkswap on server; mkswap tries to fsync which will fail.) Now, you have version which mostly works. Ask me for kreclaimd if you see deadlocks.

Network block device has been included into standard (Linus') kernel tree in 2.1.101.

I've successfully ran raid5 and md over nbd. (Pretty recent version is required to do so, however.) "

Re:NBD Does this by Matrix272 · 2003-10-29 09:29 · Score: 1

I maintain a lab with 16 Linux computers (running Red Hat 8) and 1 server. Right now, I have about 150gb or so on the server that I NFS out to all the workstations. However, each workstations has 20-80gb that they don't need and aren't using... The users all have their home directory mounted via NFS, and must have read/write access to them (obviously). Each user also must be able to SSH in, and access the console (wouldn't be much of a lab if the users couldn't sit down at a computer). I also would like to have software installed on an NFS mount, without worrying about massive performance drops.

Would NBD be able to fill all those needs? I'd like a RAID5 setup over all the computers, although maybe even some other type of RAID, like RAID5 with 5 extra disks, just in case someone powers one down... Would that work? Ideally, I'd like to make a cluster of the workstations, but also have a console for each of them... but I haven't had a lot of time to research it lately, so I don't know what's available out there. Does anyone think NBD would be a viable solution for me?

--
"It's better to have a gun and not need it than need a gun and not have it." ~ Christian Slater, True Romance
Re:NBD Does this by dbarclay10 · 2003-10-29 09:37 · Score: 5, Informative

Just to clarify what this guy is saying:
1) Make all your machines NBD servers. NBD for Linux, NBD for Windows. NBD stands for "network block device" and allows a client to use a server's block device.
2) Set up a master client/server (using Linux or something else with a decent software RAID stack). This machine will be the only NBD *client*, and it will use all the NBD block devices exported by the rest of your network.
3) On the master set up in 2), create a Linux MD RAID array overtop all the NBD devices that are available.
4) Create a filesystem on the brand-spanking-new multi-machine RAID array.
5) Export it back to the other machines via Samba or NFS or AFS or what have you.

Why does only one machine (the "master server") access the NBD devices, you ask? Because for a given block device, there can only be one client accessing it safely. Thus, if you want to make the RAID array available to anything other than the machine which is *running* the array off the NBD devices, you need to use something which allows concurrent access; something like NFS, Samba, or AFS.

Hope that clears it up a bit.

--

Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Re:NBD Does this by dbarclay10 · 2003-10-29 09:40 · Score: 1

Performance could potentially be very terrible, especially with RAID5

That being said, do some benchmarks. RAID1+0 might be more sane. (That is, a RAID1 array overtop a RAID0 array.)

--

Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Re:NBD Does this by caluml · 2003-10-29 10:23 · Score: 2, Informative

Hmm. How stable is it? From /usr/src/linux/Documentation/nbd.txt:

Note: Network Block Device is now experimental, which approximately
means, that it works on my computer, and it worked on one of school
computers.

That doesn't sound very promising to me. Usually stuff that's been in the kernel since 2.1 days is rock solid.

Isn't AFS/Coda more like the guy wants (excluding Windows-ability, although I seem to remember there being something for Andrews for Windows)?

--
Get your own free personal location tracker
Re:NBD Does this by WindBourne · 2003-10-29 10:23 · Score: 2, Interesting

I currently do this at home with 3 computers (all Linux) for my home directory. But I have been thinking that there needs to be a way to seperate parts of etc for the local system vs. the network. I have been thinking of how to write a block device that allows layers to be combined.

--
I prefer the "u" in honour as it seems to be missing these days.
Re:NBD Does this by elbrecht · 2003-10-29 10:37 · Score: 1

This sounds so cool, you could even host pr0n in Saudi Arabia with that system (placing at least 2 hosts outside the country) and in case they raid your server room they don't get you.

Hence RAID
Re:NBD Does this by Neurotensor · 2003-10-29 11:07 · Score: 1

Also you could export one file from each machine to the central server, which then mounts them via loopback and RAIDs them as before.

This would be handy when you don't want NBD for some reason. Like some machines have a perfectly good NFS/SMB mount already visible from the server and for those machines you don't mind using loopback. Or whatever.

You could incrementally add 1GB at a time to the RAID whenever you find a good home for it on one of the machines. Just think about whether you have enough redundancy for that drive to fail and take out multiple "discs" from the RAID.

The real downside of these schemes is needing to have all machines on all the time. Unless you use RAID level 1 or similar to achieve a mirrored filesystem. But as you reboot machines they will most probably get flagged as defunct, requiring manual hot-adding them for a resync. Not all that fun if you intend to reboot any of the machines. Running Windows on any of them? Don't try updating drivers then ;)
Re:NBD Does this by arivanov · 2003-10-29 11:13 · Score: 2, Insightful

There ae inherent pitfalls in it. They are mostly similar to the problem of swapping over NFS. It overall boils down to buffer management.

Basically, in order to execute the network device request you often have to get more memory. In order to get more memory you have to execute a network request. So on so forth.

Also, AFAIK RAID does not work properly over NBD.

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Re:NBD Does this by pizzaman100 · 2003-10-29 11:36 · Score: 1

This sounds so cool, you could even host pr0n in Saudi Arabia with that system (placing at least 2 hosts outside the country) and in case they raid your server room they don't get you.
Why put one in Saudi Arabia in the first place? :)
Re:NBD Does this by Dick+Faze · 2003-10-29 18:12 · Score: 1

I heard NBD was dying......
Re:NBD Does this by Delusionner · 2003-10-29 19:01 · Score: 1

Here's an interesting lab report on a goup of people developping the same thing. http://www.ipuc.pucminas.br/labep/mentor/MUG_99/Vi rtual_RAID_LINUX.PDF it explains from the architecture to contraints and problems found, and of course how they stood up to the challenge.
Re:NBD Does this by retrev · 2003-10-30 01:18 · Score: 1

> I originaly thought that it is impossible to swap > over TCP.

Back in '95 (and earlier) we had a bunch of modified MicroVAX 2's with custom video boards and no disk. They booted, mounted all files systems, and swapped over TCP. Worked like a charm. :)
Re:NBD Does this by MattBurke · 2003-10-31 09:16 · Score: 1

don't suppose you know of a similar thing for *BSD? specifically FreeBSD....
Re:NBD Does this by wouterke · 2003-11-02 14:20 · Score: 1

There's indeed a deadlock in what you describe; but that's only relevant if you try to swap over NBD, what it was primarily made to do.

If you want to run a file system or a RAID array on top of NBD, you shouldn't see that deadlock.

yes by Triumph+The+Insult+C · 2003-10-29 09:14 · Score: 1

it's called rsync

--
vodka, straight up, thank you!

Re:yes by Triumph+The+Insult+C · 2003-10-29 09:17 · Score: 1

sorry. conan cut me off last night, so i am upset

we use afs (pre-openafs, tho i'm sure openafs will work just find) on top of nbd (link escapes me right now). works pretty well.

--
vodka, straight up, thank you!

Win2k by SuiteSisterMary · 2003-10-29 09:16 · Score: 4, Informative

I believe that Windows 2000's Distributed File System allows you to do just this.

--
Vintage computer games and RPG books available. Email me if you're interested.

Re:Win2k by ...+James+... · 2003-10-29 09:22 · Score: 1

Nope -- DFS is used to distribute your data accross multiple servers but have it accessible from one location.

For example, say you have a DFS root of \\domain\dfs, with multiple children, like \\domain\dfs\mp3 and \\domain\dfs\games. mp3 and games can be shares on two different servers, but they're accessible via the same virtual \\domain\dfs share.

It's useful nonetheless.

--
get nemulator
Re:Win2k by Havokmon · 2003-10-29 09:27 · Score: 1

I believe that DFS allows you to do just this.
:s/DFS/DASD

--
"I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)
Re:Win2k by SuiteSisterMary · 2003-10-29 09:31 · Score: 2

If you look further into DFS, I believe you'll find that you can have multiple servers syncronizing the same share name.

It's pretty snazzy; it'll even try to figure out the 'closest' server to you at any given time, skip over servers that are down, and so on.

--
Vintage computer games and RPG books available. Email me if you're interested.
Re:Win2k by RedX · 2003-10-29 09:37 · Score: 1

If you look further into DFS, I believe you'll find that you can have multiple servers syncronizing the same share name
The distributed feature would be quite worthless if there wasn't some synchronization taking place to make sure the data was synched across all servers in the DFS namespace.
Re:Win2k by ...+James+... · 2003-10-29 09:47 · Score: 1

True, but I don't think that it's really what the submitter was asking for... of course I guess that would almost qualify as RAID-1, except that it's not in real time.

--
get nemulator
Re:Win2k by Anonymous Coward · 2003-10-29 10:13 · Score: 1

Domain DFS will give you replicated data. But it is only a mirror of what is elseware and not a distrbution of data with error data for data recreation in event of data loss.

I belive the request if for software that will allow distributed filesystems or datastores to act as one network device/disk. A Kind of Poormans Distributed SAN. I belive AFS included part if what you are looking for, but not exactley and Sanista's , cant remeber how to spell their name.. offer a commercial version that is GFS above their GPL version included in linux source. DFS from MS does not cut it. Clients also have to be AD DFS aware. NT4 with DFS patch will now know about closest locations and ect. Incentive to move to 2k.. now XP per MS reps I speak with. Forget SMB mount FreeBSD or linux or Sun ... ect.
Re:Win2k by Anonymous Coward · 2003-10-29 11:21 · Score: 1, Informative

The distributed feature would be quite worthless if there wasn't some synchronization taking place to make sure the data was synched across all servers in the DFS namespace.

DFS uses the File Replication Service (FRS) to ensure that all DFS replicas are synchronized. Clients connect to the closes available server (based on Active Directory Site information) and will automatically fall back to another server if one goes down.

It's actually very easy to configure. Just fire up the DFS admin tool and add a new share. When you add a second replica the admin tool will ask you if you want to synchronize the replicas. Just click yes and everything will be configured automatically. The same is true if you add more replicas.
Re:Win2k by wasabii · 2003-10-29 12:28 · Score: 1

Yeah, but it's pretty unusable on a really big distributed network. They're is no locking mechanism. It copies last modification date. So if you edit a file from two systems, that were using two different servers, and save changes, you'll loose one set of changes.

I've also had it go bonkers once and start creating infinate subdirectories during replication. I think this might have been fixed in some service pack.
Re:Win2k by kannibul · 2003-10-29 17:48 · Score: 1

I'm planning on implementing DFS to hold user profiles for around 70 employees that use Terminal Services - This will avoid the "where's my favorites" and "I saved that file on the desktop, and now it's not there" issues.
Re:Win2k by devilspgd · 2003-10-29 18:54 · Score: 2, Interesting

From my reading of DFS prior to W2K/AD's release, it was mainly built for large mostly static data which needs to be replicated across multiple sites and needs high uptime, but very specifically does not need to be updated frequently.

The concept of giving all users read/write access was thought up later on and it happens to work, but as you say, if two users update the same file, you may/will lose data.

--
Give a man a fish, he'll eat for a day, but teach a man to phish...
Re:Win2k by jd678 · 2003-10-30 00:15 · Score: 1

Folder redirection in W2K group policy will do this a lot better. Keep the local profiles on each TS, and redirect the deskop, my docs etc to a single place on the network. It doesn't do favorites, but I beleive there's a few workarounds floating about.
If you're going to use DFS for this, you might aswell enable roaming profiles instead, be just as problematic.
Re:Win2k by sprocketbox · 2003-10-30 00:53 · Score: 1

Windows DFS will do what the original poster asks about. Any single DFS share can point to multiple windows shares and data copied to the DFS share is copied to all of the replication shares. It's not RAID based in that there isn't any sort of parity data. What's happening is that the same data is getting copied across all of the shares via the File Replication Service (FRS). I believe that for this to work all of the shares involved need to be NTFS shares and this may only work on domain based DFS systems (I've not tried this with standalone DFS). I have this in place at work and it works great.
Re:Win2k by SuiteSisterMary · 2003-10-30 01:02 · Score: 1

Well, it's RAID in that it's mirrored, and there are pitfalls in the way that it handles replication (and what do you expect, we're talking arbitrary data files here, not database transactions) but for what the original poster asked for, DFS is perfect.

--
Vintage computer games and RPG books available. Email me if you're interested.

rsync Re:yes by cprice · 2003-10-29 09:16 · Score: 1

AFAIK, rsync is not really suitable for a realtime scenario. A nbd raid-5 device would be virtually realtime, no?

Re:rsync Re:yes by macemoneta · 2003-10-29 09:20 · Score: 1

Yes, if by realtime you mean really slow, for any significant volume of data. It's relatively easy to kick off rsync processes at appropriate points (like unmount, logoff, etc.). This gets you local speed access, near-line replication, and the opportunity to setup archival copies.

--
Can You Say Linux? I Knew That You Could.

Comment by TerminatorT100 · 2003-10-29 09:16 · Score: 1

I've been looking into this too. Most workstations today have large harddisks (40GB+) while on a network maybe 2-4 GB is used... Any windows software out there?

Re:Comment by skinny23 · 2003-10-29 09:42 · Score: 2

We've used something called MirrorFolder to mirror contents of specific folders across a network. It worked fairly nicely and integrated well with Windows Explorer.

http://www.techsoftpl.com/backup/

So... by Pingular · 2003-10-29 09:17 · Score: 1

Distributed Data Storage on a LAN?
Kind of like a Beowulf of hard-discs then?

--

When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)

Re:So... by macshune · 2003-10-29 09:40 · Score: 4, Funny

Man, if Beowulf was alive today he'd so kick Slashdot's ass. Seriously, this dude killed monsters, saved villages and killed a dragon. He has armor that would make any slashdotter cream their jeans when they look at the armor's tag and it says AC -9. Don't even get me started on the weapons.

If you were a medieval ass-kicker, would you want your moniker to be the butt of thousands of canned-jokes that weren't even funny to begin with?

Hmm...that's like a Beowulf cluster of usb thumb drives...

Yeah. Maybe the cheap super-computer idea Beowulf would find cool, but not the jokes and the impossible-to-Beowulf devices.

So those jokes aren't funny and probably won't get you (not you in particular, Pingular) modded up. If you want to talk about networked clusters of non-networkable devices, say:

"That's like a Duke Nukem Forever/Bit Boys graphics card/Mac OS X on a 386 cluster"

No wait, on second thought, that's not funny either.
Re:So... by macshune · 2003-10-29 15:09 · Score: 1

Yeah, my bad. I knew that, but medieval just sounded too good. And look, if you are gonna get all pedantic on my ass, please do it without hiding. If you really wanted to defend Beowulf's name, you'd do it showing your login.

Oh, and speaking of being pedantic, you say he was possibly the greatest warrior to ever live, but there is little in the historical record to corroborate what has been written of him. That's akin to saying King Arthur was possibly the greatest warrior of all time.

I love the Beowulf epic as much as the next thinking, reading person (like you, for instance), and I think the best way to celebrate his story (whether true or fictional) is to think about it in terms other than canned jokes.
Re:So... by zabieru · 2003-10-29 15:48 · Score: 1

The 'middle' ages are commonly considered to have been the time between (in the middle of) the fall of the Roman Empire in the 5th century and the Renaissance (dates varying depending on what part of Europe we're talking about). So Beowulf, in the 8th century, falls in the low middle ages.
Re:So... by macshune · 2003-10-30 06:11 · Score: 1

Thanks for vindicating me:)
Re:So... by Omega996 · 2003-10-31 17:29 · Score: 1

to further drive this completely off-topic, I'd beg about the greatest warrier to ever live. I'd think it'd either have to be William the Bastard, or the man whom he defeated, Harold. Norman bastard... *mumble* *mumble*

rdist would work... by ZenShadow · 2003-10-29 09:17 · Score: 4, Informative

The obvious answer for this is nbd, as pointed out in another post -- but I would have concerns about speed with that kind of setup. I'd be interested in hearing reports on that.

But if you don't want to get into nbd, you can tolerate delayed writes to your virtualized disks, and all you want is the network equivalent of RAID level 1, then you could always just set up an rdist script that synchronizes your local data disk with a remote repository (or eight) every so often...

--ZS

--
-- sigs cause cancer.

Re:rdist would work... by neverbeeninariot · 2003-10-29 12:11 · Score: 1

Alternatively, if you're using any NT derivative (4 and up), you can schedule a task to use Robocopy (from the Windows Resource Kits) to mirror data to remote shares.

Here: RoboCopy
Usage: RoboCopy

nbiar

Standard Linux kernel maybe? by buzzbomb · 2003-10-29 09:18 · Score: 2

Perhaps multiple files over different networking procotols (SMB for Windows machines, NFS for the Linux machines) mapped to built-in loopback devices (/dev/loX) accessed through built-in md utilizing software RAID5? Heh. It might not be pretty or fast, but it would probably work just fine. It may just give the kernel absolute fits though.

Anyone tried this?

Re:Standard Linux kernel maybe? by backtick · 2003-10-29 09:26 · Score: 3, Informative

NBD *is* standard Linux kernel. It's built right in: /usr/src/linux-2.4/Documentation/nbd.txt

If you're curious about using the enhanced NBD w/ failover and HA, you can read about it at:

http://www.it.uc3m.es/~ptb/nbd/#How_to_make_ENBD _w ork_with_heartbeat
Re:Standard Linux kernel maybe? by buzzbomb · 2003-10-29 09:32 · Score: 1

NBD *is* standard Linux kernel. It's built right in: /usr/src/linux-2.4/Documentation/nbd.txt

Ok. But does it work under Windows? That was one of the requirements.

InterMezzo by Anonymous Coward · 2003-10-29 09:18 · Score: 1, Informative

Sounds like Coda or InterMezzo would fit the bill, but they won't address non-linux systems directly. You'd have to export the InterMezzo file systems with Samba and mount them on the MS Win boxes.

Re:Intermezzo by laursen · 2003-10-29 10:12 · Score: 5, Informative

Intermezzo is designed for this and a bit more - if one of the machines is a laptop you can take it away and work on it, and it'll resync when you get back.
We have looked at various distributed filesystems for use in a clustered setup of webservers. We wanted to remove the single point of failure from a central NFS server - Intermezzo was one of the filesystems we had a look at.
The idea behind Intermezzo is fairly simple and the documentation is good. The Intermezzo system looked like an ideal solution for our setup (Coda and OpenAFS are far to complex for use in a distributed filesystem on a closed internal net).
We tested the system but sadly it's not really production stable and I can't advise that you use it.
If you are looking for a SAFE solution then Intermezzo is not for you - you will just end up with garbled data, deadlocks and tons of wasted time ...
My 2 cents.
Re:Intermezzo by mikeee · 2003-10-29 10:18 · Score: 1

So what did you settle on?
Re:Intermezzo by laursen · 2003-10-29 10:21 · Score: 2, Informative

We bought a large Storegatek raid (2 x RAID 5) and used NFS.

NFS is a proven filesystem and it has been tested for years. It's compatible with all major UNIX flavors and BSD/Linux systems.
Re:Intermezzo by rsax · 2003-10-29 16:56 · Score: 1

We have looked at various distributed filesystems for use in a clustered setup of webservers. We wanted to remove the single point of failure from a central NFS server - Intermezzo was one of the filesystems we had a look at.
[snip]
If you are looking for a SAFE solution then Intermezzo is not for you - you will just end up with garbled data, deadlocks and tons of wasted time ...
What did you end up going with?

AFS by Reeses · 2003-10-29 09:18 · Score: 4, Informative

It's called the Andrew File System.

http://www.psc.edu/general/filesys/afs/afs.html

There's another alternative with a different name, but I forget what it's called.

--
Reeses

Re:AFS by Reeses · 2003-10-29 09:24 · Score: 1

Whee.. replying to my own post... In addition to AFS...

Coda:

http://coda.cs.cmu.edu/

and InterMezzo:

http://www.inter-mezzo.org/

and there's a review here:

http://www.linuxplanet.com/linuxplanet/reports/4 36 1/1/

Although, honestly, a 5 second search on google for "distributed filesystem" would have turned this up.

Ah, well.

--
Reeses
Re:AFS by wetshoe · 2003-10-29 09:25 · Score: 1

I'd have to agree, AFS is a great solution. I actually thought of this about a year ago, and I told a co-worker about it. He told me it had already been implemented, and as it turns out, it was, it's AFS.
AFS is actually pretty cool. You can run a file server that uses all this disk space of all the client machines. It's a great idea now, especially since most new machines come with 40GB hard drives, and most people don't use anything more then 5GB.
AFS is a wonderful solution to not only this problem that the poster is talking about, but it can be used in so many other interesting ways.
Re:AFS by kaybi · 2003-10-29 09:26 · Score: 1

OpenAFS

http://openafs.org/
Re:AFS by fireboy1919 · 2003-10-29 09:47 · Score: 4, Interesting

In my experience, it's one of those "it would be a wonderful thing if it worked."

It requires it's own partition for each mount of it; you can't just share disks you've already got.

Setup also takes hours, and it probably won't work the first time. Online documentation is incredibly outdated, which doesn't help matters at all. It also takes a hefty chunk of computer to run it, because it requires a lot of watchdog type programs to fix the frequent corruption that happens to it as you use it.

The servers time has to be matched exactly, so it's also best if you've got an NTP server running and clients on all the machines.

It's also about ten times slower than Samba (which you might use instead to share with Windows machines), and it chokes when you try to move/copy/delete large files.

I tried it for a month before it completely corrupted it's own partition and I switched back to NFS and Samba.

I can't wait for the day when these problems are but a memory and such a system works flawlessly.

--
Mod me down and I will become more powerful than you can possibly imagine!
Re:AFS by Strange+Ranger · 2003-10-29 10:24 · Score: 4, Informative

from karmak.org

AFS is based on a distributed file system originally developed under a different name in the mid-1980's at the Information Technology Center of Carnegie-Mellon University (CMU). It was first publically described in a paper in 1985, and soon afterwords was renamed to the "Andrew File System" in honor of the patrons of CMU, Andrew Carnegie and Andrew Mellon. As interest in AFS grew, CMU spawned the Transarc Company to develop and market AFS. Once Transarc was formed and AFS became a product, the "Andrew" was dropped to indicate that AFS had gone beyond the Andrew research project and had become a supported, product quality filesystem. However, there were a number of existing cells that rooted their filesystem as /afs. At the time, changing the root of the filesystem was a non-trivial undertaking. So, to save the early AFS sites from having to rename their filesystem, AFS remained as the name and filesystem root. In the late 1990's Transarc was acquired by IBM, who subsequently re-released AFS under an open source license. This code became the foundation for OpenAFS, which is currently under active development.
It's still running and running well at CMU (AFAIK - as of late 90's). Every student gets an "Andrew" ID. Actually the very first networked computer I ever logged into (other than dialing a bbs) was a 'node' on Andrew, in 1988. Very very cool at the time, and still is.

--

Operator, give me the number for 911!
Re:AFS by pHDNgell · 2003-10-29 10:26 · Score: 2, Insightful

In my experience, it's one of those "it would be a wonderful thing if it worked."

I've been using it for years. I've found nothing that works better. I've got ``clients'' that are IRIX, NetBSD, Solaris, SunOS 4, NetBSD, MacOS X and FreeBSD and I use it to serve my web root, home directories, various applications (my mail server etc...) I can't imagine using something else.

It requires it's own partition for each mount of it; you can't just share disks you've already got.

This is very misleading. A file server has to have a dedicated partition. Clients need nothing but OpenAFS or similar installed. Mount points are global and management is distributed. Thinking that AFS is anything like NFS would certainly lead to a bad experience. It solves many, many problems with NFS.

The servers time has to be matched exactly, so it's also best if you've got an NTP server running and clients on all the machines.

And your AFS server and client comes with them. I can't imagine what the problem would be with having times matched, anyway. I've gone through the horrors of tracking down log entries from systems that didn't have time synchronized. I don't want to do that again.

It's also about ten times slower than Samba (which you might use instead to share with Windows machines), and it chokes when you try to move/copy/delete large files.

Slower at what? Access times? Add another server, it's not like you have to tell the clients. Write times? I don't know about that, I wouldn't want to run a database off the thing, but that's not what it's for. I have no idea what you're talking about regarding it choking on large files. I haven't seen that.

I tried it for a month before it completely corrupted it's own partition and I switched back to NFS and Samba.

How exactly did it corrupt its own partition? I've never seen such a thing. Perhaps you did something you were not supposed to do (like anything in its own partition).

I can't wait for the day when these problems are but a memory and such a system works flawlessly.

There have been some *very* large AFS installations for years (MIT, CMU, etc...). I wouldn't think that would be the case if such problems were common.

--
-- The world is watching America, and America is watching TV.
Re:AFS by Umrick · 2003-10-29 10:31 · Score: 3, Informative

Never mind that AFS has been in production for literally years, serving terabytes of data for 10 thousand + clients (in several installations of AFS).

The Windows client did have some notable slowness issues, performance with Linux is excellent, and scales much better than NFS. Clients are available for a large number of OSs. Doesn't matter if it's the right time, just A time. So setup NTP on one machine as a primary, and the others can use ntpdate to set time once a day.

AFS started around 1986 as a commerical offering, IBM made it opensource in 2001. It can be a serious pain to set up at first, documents are indeed very outdated. Other limitations are no support for >2gig files. You can have readonly duplicates of data on multiple machines. Administration can be a dream once it's running.

You will need to have ext2 partitions available for storage (OpenAFS uses its own transaction system, and you WILL have race conditions if you put it on a journalling filesystem).

Also note that as of right now, 2.6 kernels are not supported, though 2.4/2.2 are fine.

www.openafs.org

CODA which was a start at an open source answer to AFS way back when, has even more out of date documentation, has never been used in production (that I know of), and basically is not nearly as ready for prime time as OpenAFS.

www.coda.org
Re:AFS by RageEar · 2003-10-29 10:35 · Score: 1

I worked for a company that used this on all of our *NIX based servers. I never ran into too many problems as an end user and when I did they were easily fixed.

However, talking to our IT director, he said it was one of the biggest pains in the ass to administer. He was forced into using this system by the VP of Engineering, because said VP was an alum from CMU. The IT director wanted nothing more to switch over to an NFS/filer based solution.

Just my two cents.
Re:AFS by ipjohnson · 2003-10-29 11:10 · Score: 1

Several installations ... Hell when we put up my universities AFS cell there where over 200 other publicly accessible cells in the world and that was in 1998. Before openAFS was any where near primetime.

Personally I love AFS. especially if you want to access it outside of you little home network. I mean I used to access it over my cable modem from home and school was 4-5 hours away.

At the time IBM's cell was the largest in the world spinning something like a petabyte. I'm sure there cell has grown since then.
Re:AFS by rivaldufus · 2003-10-29 11:30 · Score: 1

I replaced an nfs server with an AFS (OpenAFS) cell at my last company (we had a max of about 40 developers). It was quite difficult to set up (I guess I started setting it up in 2001 or so). I had three Sun E250's with 300GB of Raid 5. They were all single processor machines (Ultra 2 - 450) with only about 1GB Ram. Those machines weren't super powerful, and I don't think file access was unusually slow.

There were stability issues in the early releases, but those soon disappeared. While it's true that it runs several processes to monitor things, they are quite useful. (There is even a process to monitor the AFS binaries to update them if you install a newer version!).
Once it was up and running, it seemed to work without too many problems.
Backups are a breeze - you can make snapshots of all the volumes and then back up those snapshots. You can also leave the snapshots up on read only mountpoints so users can retrieve files easily from the day before.

It's very easy to add volumes and drive space, once you get the cell up. I implemented quotas (a very wise idea with user's home directories. Each home directory was a separate volume and I could move a user's home directory from one server to another while they were using it (and they never knew)!

Making backup volumes was a snap, and I had the most used data mirrored on all three.

I had to write a lot of scripts to manage things like adding new users (I had three servers so I wanted to balance home directories across the three servers.) I also had to write several backup scripts, and I tended to run the backup process in a screen session, believe it or not. Surprisingly, there were no problems with the backup process in a screen session.

Perhaps the worst thing was when a server was accidentally shut off as the server would run a recovery process that occasionally took a while.

Overall, I think the biggest problem was that the users weren't used to it and did not like having to install an AFS client on their windows machines, but I wrote tons of documentation to remedy that. Unix users were continually misreading file permissions (standard unix doesn't see the AFS acls, of course) Perhaps my biggest complaint was the fact that there were no file level permissions/acls; permissions/acls were on directories, and applied to the files below.

All in all, I'd do it again if I started working at a brand new startup. I'd be happy to answer questions about it, if anyone is interested.
Re:AFS by laursen · 2003-10-29 12:29 · Score: 1

Hi,

It sounds like an interesting setup and I would love to spend some time on it :) Do you have any of your docomention online? :)
Re:AFS by evand · 2003-10-29 14:47 · Score: 1

I can confirm that, as of Wed Oct 29 21:47:03 EST 2003, it's still running well at CMU :-)
Re:AFS by fireboy1919 · 2003-10-29 16:54 · Score: 1

Slower at what? Access times? Add another server, it's not like you have to tell the clients. Write times? I don't know about that, I wouldn't want to run a database off the thing, but that's not what it's for.
Slower access times. It takes ten times the network traffic that Samba does, so you can't access your files quickly, if, for instance, you want to stream the data for some reason. Over a 100Mbit link, 10 times slower means a lot. Since this is one of the only filesystems with a Windows client, it could supplant Samba, save for this flaw.

I have no idea what you're talking about regarding it choking on large files. I haven't seen that.
Well, I have. That was what killed it eventually, actually. I tried to delete a 1 GB file, and the partition corrupted beyond repair.
Every time I tried that I got either:
1) File system corruption (followed by the watchdog taking the system offline, repairing it, and putting it back online with that file still there).
2) 30 minutes of waiting during which time the file was deleted.

I tried it last year, and I compiled everything myself. Maybe it's just really unstable when you don't get the executable straight from the website, or it's much better now.

--
Mod me down and I will become more powerful than you can possibly imagine!
Re:AFS by RedFyre · 2003-10-29 17:19 · Score: 1

If you are interested in using AFS, head on over to openafs.org

You should also subscribe to the openafs-info mailing list, and the peole there will be more than happy to help you through your difficulties.

Setup is actually easier than most people make it out. There are a couple of quirks that can pop up, but if you know what those are, then, getting around them is easy.

What makes the setup actually seem more difficult is the way the documentation is written. The docs go through each step, splitting each step into separate sections for the various OSes.

If you take the steps for your OS and put them all in one place, it becomes a lot clearer what you are going to do.

Once again, figure out on what os/hardware you want to install the thing, then send an email to openafs-info and see if anyone has any tidbits to offer about your particular setup (of if you are just missing some critical step in the setup process).

Also check out the mailing lists for the individual OS you will be using (eg port-darwin for OS X/Darwin users).

Re:NBD Does this - NBD server for windows by flok · 2003-10-29 09:19 · Score: 5, Informative

And since the guy is also using windows-boxes, an NBD-server for windows can be found here:
http://www.vanheusden.com/Loose/nbdsrvr/
This version enables you to also export partitions/disks.

--

www.vanheusden.com - home of Multitail, HTTPing, CoffeeSaint, EntropyBroker, rsstail, bsod, listener, nagcon, nagi

Why? by Anonymous Coward · 2003-10-29 09:19 · Score: 2, Funny

I have 8 computers at my house on a LAN. I make backups of important files, but not very often

I mean, let's be honest here. We are all dorks, but this guy is king dorkus dweedius maximus. Don't fool yourself about the "important data" - it is just pr0n and pirated MP3s.

If it was real work, there would be a real IT guy with real RAID and real backup tapes working on the problem,. But we know it isn't real work, because if this guy had a real IT job, h couldn't stand coming home and dealing with 8 friggin computers.

We realize you think you are cool because you have a few KVMs, a couple of Linksys routers, and a bunch of old PIIs running Lunix with one Windows machine, but come on, man. Stop spanking yourself over your elite NAT-ed network and just get one computer with hardware RAID. Instal Cygwin if you feel the need to type configure && make && make install a whole bunch of times and watch teh pretty text lines scroll.

Re:Why? by pandrijeczko · 2003-10-29 12:07 · Score: 1

Stop being a troll just because it's mid term school holidays and you've nothing better to do...
The guy asked a legitimate question and all you can do is throw insults.
If anyone's the dork, it's you...

--
Gentoo Linux - another day, another USE flag.
Re:Why? by deficit · 2003-10-29 13:07 · Score: 1

I only have an old P3 at my house.. And I still think I'm cool.. Any problems you Anonymous Coward ?
Re:Why? by SillyKing · 2003-10-29 15:37 · Score: 1

?If this guy had a rea IT job, he couldn't stand coming home and dealing with 8 friggin computers?

There are some things you are not considering here, as there are people who have small clusters of machines at home. For example, I have 9 computers running Seti@home. Now since I have these 9 computers, I also use them for education on operating systems, so have several linux and windows variants (sorry, not a MAC user).

I also find use in being able to redundantly backup multiple machines. However, I am only concerned with 3 machines that have any needed data on them: spousal units, my laptop, and my desktop. 2 windows and 1 linux box. I do this with a schedule NT backup of required NT anmd Samba shares. Works well for me, and has for years with an old Exabyte 8505 SCSI tape drive.

Saying that somebody has no immediate need for backing up data from multiple machines because they only have pr0n and mp3's is not a valid argument today. Besides, as many /.'ers that have replied would suggest he is not the only one who has pondered this need, as there are those who know the answer and those who have kindly developed it!
Re:Why? by pliny3 · 2003-10-29 16:28 · Score: 1

For example, I have 9 computers running Seti@home.

You are clearly not paying your own electricity bills.
Re:Why? by Alioth · 2003-10-30 00:59 · Score: 1

> and a bunch of old PIIs running Lunix

Lunix doesn't run on a PII. It only runs on machines like a Commodore 64.

--
Oolite: Elite-like game. For Mac, Linux and Windows

Most common form of data loss? by Anonymous Coward · 2003-10-29 09:19 · Score: 5, Insightful

I'd argue the point that the most common form of data loss is a crashed hard disk.

In my 14 years as a Network Administrator I think I've restored backups due to failed hard disks about twice (RAID catches the rest).

But I restore data accidentally deleted or changed by a user at least weekly! A distributed storage system won't help you there.

However, I will grant that the average /. user knows what they're doing with their data far more than my average user does and is less likely to cause self-inflicted damage.

Re:Most common form of data loss? by JohnFluxx · 2003-10-29 09:25 · Score: 1

That's why I don't know why _by default_ it isn't set up to have the whole of /home under cvs
Re:Most common form of data loss? by Xerithane · 2003-10-29 09:37 · Score: 1

That's why I don't know why _by default_ it isn't set up to have the whole of /home under cvs

CVS isn't designed for that, unless you only store documents or have some pretty stringent filters setup on CVS. CVS is for versioning, and you don't really want to maintain a backlog of every version of every file in your home directory.

--
Dacels Jewelers can't be trusted.
Re:Most common form of data loss? by Blackknight · 2003-10-29 09:39 · Score: 4, Insightful

That's one feature from VMS that I wish unix had. File versioning was built in to the file system, so if you wanted the old version of a file back you just had to roll back to the old one.
Re:Most common form of data loss? by jonbrewer · 2003-10-29 09:46 · Score: 1

In my 14 years as a Network Administrator I think I've restored backups due to failed hard disks about twice (RAID catches the rest).

Business environments are generally more robust - especially when it comes to things like power. Not only the mains power, but power supplies. A lousy power supply can kill a hard disk as easily as a line surge. In the last ten years I've personally lost a 4.3 GB Atlas Wide SCSI and a couple of Maxtor 60GB IDE drives. In both cases my backups a month out-of-date. :-(

Also have seen numerous IDE drives go in low-end IBM and Compaq boxes in a business environment in Cambridge with poor power reliability.
Re:Most common form of data loss? by ckaminski · 2003-10-29 09:50 · Score: 3, Interesting

But say I do? I mean, versioning databases are the next bit, man. Why not have a chmod +v for versioning? If this bit is set, then apply version control. Every file open/write/close sequence adds a new version delta. Sure, there's a performance hit associated with it, but I'd like the choice.

AFAIK, there's at least on project out there to turn CVS into a filesystem, and a few others to add MVCC functionality into a filesystem (somewhat like the Clearcase filesystem does).

It's a good feature, something I'd want on my docs and code, and other specs, not necessarily on my pr0n and MP3s.

-Chris
Re:Most common form of data loss? by Gyorg_Lavode · 2003-10-29 09:52 · Score: 1

I've lost probably 4 hard drives over the last 3 years (1 2 weeks ago, 1 I realized is going bad today). While I could raid them, I really don't want to buy double the disk space just so that I can have a raid array when I only need redundancy for 10% of the data.

--
I do security
Re:Most common form of data loss? by blixel · 2003-10-29 10:03 · Score: 1

I don't know man. I have no faith in hard-drives any more. I use to buy Quantum drives and I never had a single crash with any of them. I still have 2 Quantum drives from years past and they are perfect. Unfortunately Quantum was bought buy or merged with Maxtor. Huge mistake. In the last 2 years though I've had 3 Maxtor drives crash on me, and 2 IBM Deathstars die on me. The last time I sent my Deathstar in for RMA, after having read that the entire line of drives was prone to failure, I just sold the factory sealed replacement on eBay. (With full discolusre that I had already replaced the thing twice so the buyer should beware.)

Currently have I have 3 Western Digital drives (2 120GB and a 200GB) and I haven't had any problems with them yet. But they'll have to last at least 5 years to appease me.
Re:Most common form of data loss? by heXXXen · 2003-10-29 10:06 · Score: 1

that's a good one man.

the power in our office is so shitty that we've lost a dozen power supplies, several HDs, motherboards, etc.

the power supply died on our server over the weekend, took the motherboard and KVM with it. and it was an enermax!

the power is much better at my home.
Re:Most common form of data loss? by Gherald · 2003-10-29 10:10 · Score: 1

> I think I've restored backups due to failed hard disks about twice (RAID catches the rest).

Yes, but all us users of more than one home pc (ie, enthusiasts) use RAID 0, which has the opposite effect. So for us, a suplemental distributed RAID is a GREAT idea for our documents, e-mail backups, and other stuff we want to keep permanently and access from any of our home stations.

--
The unofficial /. digest
Re:Most common form of data loss? by steveha · 2003-10-29 10:18 · Score: 3, Informative

0) Mirroring (RAID 1) takes double the disk space; but you could use RAID 5 instead. A 4 disk RAID 5 would take 4/3 as much disk space as you get to use.

1) You could make a partition that is 10% of your disk, make another identical one on another disk, and mirror those. Then put your 10% critical data in there.

2) Do what I do: set up a RAID server, and keep all critical data on that. This is good if you have a home network with multiple computers. It also makes data sharing easy among the computers.

steveha

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Re:Most common form of data loss? by caluml · 2003-10-29 10:26 · Score: 1

Not a bad idea. Ext4, or Reiser v5?

It's been 15 seconds since you hit 'reply'!
Goddamn it. It only took me 15 seconds to type it.

--
Get your own free personal location tracker
Re:Most common form of data loss? by GreyPoopon · 2003-10-29 10:37 · Score: 1

Why not have a chmod +v for versioning? If this bit is set, then apply version control.
You're running the wrong operating system. Maybe you need VMS. :-)

--
GreyPoopon
--
Why is it I can write insightful comments but can't come up with a clever signature?
Re:Most common form of data loss? by merlin_jim · 2003-10-29 10:42 · Score: 1

I'd argue the point that the most common form of data loss is a crashed hard disk.

In my 14 years as a Network Administrator I think I've restored backups due to failed hard disks about twice (RAID catches the rest).

Yeah but the original poster doesn't have RAID. So every failed hard drive is a problem. In my 13 years as a home user without RAID I've suffered data loss due to failed hard disks some 7-8 times.

But I restore data accidentally deleted or changed by a user at least weekly! A distributed storage system won't help you there.

Ummm save your company some money... move to a journaling file system. Windows .NET Server aka Windows Server 2003... I don't know the equivalent in Linux land.

The commercial where the guy explains to the business guys how volume shadow copy and journaling will save them millions annually? Yeah he wasn't joking...

--
I am disrespectful to dirt! Can you see that I am serious?!
Re:Most common form of data loss? by k12linux · 2003-10-29 10:43 · Score: 1

In my 14 years as a Network Administrator I think I've restored backups due to failed hard disks about twice (RAID catches the rest)
But I restore data accidentally deleted or changed by a user at least weekly!

Of course, without some type of redundancy, a failed drive *will* kill your data pretty much 100% of the time. I have to agree with you on the "end user" problem though. Even experienced users can goof. I'm much more likely to blow my data away myself than have a drive go out.
That's why my preferred method for disk redundancy on my work PC is a 2nd drive and a nightly rsync. The "backup" takes about 5 minutes. And since it isn't RAID, when I make a mistake it isn't made on all drives at once. As long as I notice my goof before the backup runs at 5am, no problem.
I had initially planned to do RAID 1 on the drives, but after I thought about it I like the nightly rsync better. I'd much rather lose (at most) one day's work than not have any backup at all if I mess something up.
BTW, an rsync between drives probably averages about three minutse on my system. I also have a multi-user server I rsync accross a 10Mb WAN link and even that is generally only 10-20 minutes.
Re:Most common form of data loss? by atomray · 2003-10-29 11:02 · Score: 1

I've been meaning to look into this - is there a Linux filesystem that supports versioning? I kind of figured there must be, it was such a nice feature of VMS I'd be surprised if something similiar hadn't been implemented for Linux. Anyone feel like saving me the time of searching myself? :)

--
take your sig and shove it
Re:Most common form of data loss? by pe1chl · 2003-10-29 11:07 · Score: 1

I have always considered this a disadvantage. SCCS had the same problem.

These systems operate with the notion of a background storage system where you checkin and checkout working versions.
What I often really want is a system where a number of files that are frequently changed can be separately kept in a versioning system. So, when I checkin something, I do not want the file to disappear.

I wrote a simple script that is run once a day, and puts the current version of a file in CVS when it has changed, but keeps the file available. I always wonder why such a mode of operation is not available.
Re:Most common form of data loss? by Linux_Bastard · 2003-10-29 11:13 · Score: 1

amen.

Until you live with it, you don't realize how few changes you make, and how valuable they are.

But this is because of how I use my home directory. Some run db, ftp and/or www out of /home/.

I wish I had the option under ext3.

--
F X=0:1:9999 F D=2:1 Q:((X>2)&(X#D=0)!((D>X/2)&(X'=1))) I D>(X/2) W:$X>75 ! W X,?$X+5-$l(X) Q
Re:Most common form of data loss? by Xerithane · 2003-10-29 11:22 · Score: 1

AFAIK, there's at least on project out there to turn CVS into a filesystem, and a few others to add MVCC functionality into a filesystem (somewhat like the Clearcase filesystem does).

The thing is you have to filter (which I mentioned briefly) what is being stored. You can't do exact diff versioning on binary files, and compare them like you can with flat files (or even XML files, with any sort of ease.) This is why a versioning control file system doesn't work well, and I've travelled down this road pretty far. CVS/RCS/SCCS just isn't the tool for the job.

However, adding a revision tag to your files is a good method of doing it, but a good CVS setup will handle this as well. It just takes learning CVS really well, and at that point you just don't care that much to do it (that was my problem.)

It's a good feature, something I'd want on my docs and code, and other specs, not necessarily on my pr0n and MP3s.

You can set CVS to just do certain file formats really easily. Just have a cvs module "home" and then as root do a cvs co home from root, and setup your filters in the CVSROOT/ dir on your master server. At my old company we setup a system like this for user projects, and it operated like a second home directory. What works out even better is being able to share (it gets a bit complicated for that though) projects amongst multiple project-home-directories. The key is to setup multiple repositories and use merge utilities when someone checks in so it gets duplicated against everybody elses repositories. It works well, and ant was a great help.

Anyway.. I'm rambling.

--
Dacels Jewelers can't be trusted.
Re:Most common form of data loss? by angst_ridden_hipster · 2003-10-29 11:26 · Score: 4, Informative

As I always chime in at this point:

Use rdiff-backup!

http://rdiff-backup.stanford.edu/

Configurable, secure, distributed, versioning incremental backups.

It's not a replacement for RAID, but is good for nightly inter-machine backups.

There's also a related project where the far-end repository is encrypted, so you can have it on any public server without fear of having your data read by the wrong people.

Very cool. It's saved my ass a few times.

--
Eloi, Eloi, lema sabachtani?
www.fogbound.net
Re:Most common form of data loss? by merlin_jim · 2003-10-29 11:37 · Score: 1

snapshots in the microsoft world are for backup purposes and live upgrading... Microsoft calls what normal people call snapshots "Persistent Shadow Copies" and references them as journaling file systems.

Please see pages 6-8 of the Storage Management and Backups section of the beta 2 copy of the Windows Server 2003 Technical Readiness guide (which I unfortunately cannot advise on how to procure, as it is a confidential document)

--
I am disrespectful to dirt! Can you see that I am serious?!
Re:Most common form of data loss? by Gyorg_Lavode · 2003-10-29 11:39 · Score: 1

What type of raid solution are you using for this? Hardware? Software? Also, what type of file system are you using to share your raid server drives? (Are you sharing to both windows and linux or just linux?)

--
I do security
Re:Most common form of data loss? by anthonyrcalgary · 2003-10-29 11:54 · Score: 1

"However, I will grant that the average /. user knows what they're doing with their data far more than my average user does and is less likely to cause self-inflicted damage."

eh...

I know what I'm doing and I've found some pretty impressive ways to shoot myself in the foot. Right now the knowledge just makes me dangerous. I wouldn't let me anywhere near a production system.

--
When someone might yell at me, it has to be OpenBSD.
Re:Most common form of data loss? by steveha · 2003-10-29 11:56 · Score: 1

I'm using pure Linux software RAID. My old server is a RAID 1, and the new one is RAID 5 with 3 disks. Both are IDE, with one IDE drive per controller, as recommended in the IDE RAID HOWTO.

I have both Samba and NFS running. All our computers run Linux now, but some used to have Windows, and some are dual-boot. Samba works very well.

steveha

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Re:Most common form of data loss? by VisorGuy · 2003-10-29 12:19 · Score: 1

I wasn't aware of this (probably since I'm not old enough), but I was just thinking that there should be a file system with versioning features like CVS, subversion or any WIKI has.

--
This user account is inactive account replaced by the PDA
Re:Most common form of data loss? by Cplus · 2003-10-29 13:10 · Score: 1

I would suggest a UPS with a decent power conditioner, especially for your server systems. I sell MGE in Canada and the prices are quite reasonable...sounds like you've already lost systems and data of a value far past what it would cost to install even a really high-end system.

--
"Share your knowledge. It's a way to achieve immortality." -- Dalai Lama
Re:Most common form of data loss? by penguin7of9 · 2003-10-29 14:01 · Score: 2, Interesting

That's one feature from VMS that I wish unix had.

That feature doesn't need to be in the kernel, since it can easily and transparently be provided in user space.

If you like, you can enable this right now using a simple hack on top of PlasticFS or your own, custom LD_PRELOAD hack.

Providing file versioning in the kernel or enabling it globally in some other form has not caught on because it is a huge hassle and causes lots of problems, even in systems that know about it.

For example, when you retag one MP3, do you want to keep an old version? What about if you retag your entire 50G collection of MP3s?

The default of not versioning files in UNIX works better. Versioning and its implementation is highly application and implementation dependent. Emacs, OpenOffice, cvs, and other tools do the right thing, and they do it much better than anything the kernel could ever hope to do.
Re:Most common form of data loss? by DAldredge · 2003-10-29 14:40 · Score: 1

Why don't you buy a good powersupply from someone like PC Power and Cooling?
Re:Most common form of data loss? by hrath · 2003-10-29 15:03 · Score: 1

Hi,

I've used rdiff-backup/ssh/rsync very successfully in the past and recommend them where I can.

You mention a related project where the far-end repository is encrypted. I poked around the rdiff-backup page but didn't find it. Could you please point me in the right direction (URL?), as this sounds very interesting.

regards,

Heiko
Re:Most common form of data loss? by Gyorg_Lavode · 2003-10-29 15:24 · Score: 1

Can you point me to the howto you used? I didn't have it come right up at tldp.org. Also, I could use a simple NFS howto as I don't understand how it works and how to use it well.

--
I do security
Re:Most common form of data loss? by jonadab · 2003-10-29 15:34 · Score: 1

> I'm much more likely to blow my data away myself than have a drive go out.

You're either very very careless, or you've had MUCH better luck with hard
drives than I have. I've had three hard drives go bad on me over the years.
I haven't deleted anything I still wanted in a *long* time. I'm more likely
to lose data due to a power outage than I am to delete it myself.

However, I'd have to say that the *most* common cause of dataloss is buggy
software. I've lost more data over the years due to crashes (either of
the entire OS when I was using Windows, or X11 has crashed on me a couple
of times, or application crashes) than the *square* of the amount I've lost
to all other causes combined. (When the hard drives went bad, I didn't lose
all the data on them, because individual sectors went bad at first and I was
able to copy almost everything off. Yes, I was lucky; occasionally a drive
goes bad all at once. That would suck.)

As for user error, I agree with the guy upthread who advocated VMS's solution.
ITS had automatic versioning (for files flagged for it) way back in the days
of the PDP8. VMS has had it forever. Yeah, there are some files you would
not want versioned (e.g., logfiles), so you'd want to be able to mark certain
directories and/or certain files to be versioned or not, and you wouldn't
want every version saved forever (so, the system should automatically know
how to keep sparser and sparser versions the further back you go, but these
are not insurmountable issues by any means.

Would it use more space? Yeah, of course. So, make it optional, and
people who want to save space can choose to be careful.

--
Cut that out, or I will ship you to Norilsk in a box.
Re:Most common form of data loss? by jonbrewer · 2003-10-29 16:09 · Score: 1

I use a 430W Antec on my primary box now. :-)
Re:Most common form of data loss? by jhunsake · 2003-10-29 16:17 · Score: 1

"Persistent Shadow Copies" and references them as journaling file systems

I think you're the one getting confused. Microsoft isn't that stupid. They know what journalling file systems are (ie NTFS).
Re:Most common form of data loss? by zatz · 2003-10-29 16:37 · Score: 1

Well, in the specific instance of editing an ID3 tag, it could be smart enough to notice that only one block of the file changed, and share the remaining blocks with the old version. And that's something you can only do efficiently with kernel help; if you want userspace libraries doing that kind of thing, you end up reimplementing most of the filesystem innards yourself.

--

Java: the COBOL of the new millenium.
Re:Most common form of data loss? by noda132 · 2003-10-29 16:50 · Score: 1

Funnily enough, just yesterday I put a " *" at the end of an "rm" command in my home directory.

Thankfully, I keep my data organized into subfolders (and rm isn't recursive by default). All I lost was a file I never use, a few symlinks, and that Windows-driver-wrapper thing, that I was debating installing because I don't like the license.

So, I'd argue that even /. users get it wrong. I've been using bash for several years with no errors whatsoever, but all it takes is a single extra character at the end of a tab-completed command and BOOM! Empty home directory!
Re:Most common form of data loss? by blixel · 2003-10-29 17:37 · Score: 1

will start drawing broad conclusions about hard drive reliablility with a sample size they could count on one hand

I guess you've never had the privilege of owning an IBM Deathstar. Nor did you bother to follow the link where you can read thousands of stories by other Deathstar owners who have had failed drives. Nor did you bother to check google to see how many people have complained about Maxtor drive reliability. Too bad. Your troll post might have held credence otherwise.
Re:Most common form of data loss? by steveha · 2003-10-29 19:19 · Score: 1

http://www.ibiblio.org/pub/Linux/docs/HOWTO/other- formats/html_single/Software-RAID-HOWTO.html

That's the big one.

I can't at the moment remember where I read the "IDE disks RAID" howto; there doesn't seem to be an official HOWTO that matches what I remember. I can summarize it for you easily, though: IDE RAID is just fine, as long as you have one IDE disk per IDE controller. Those cables that let you hook up two drives to one controller? Don't use them. This means that for most motherboards, you will only be able to hook up two drives, and if you want to do RAID 5 you will need a PCI IDE controller board.

Why only one driver per controller? Two reasons: first, IDE sucks with multiple devices so performance suffers; second, if an IDE drive dies, it might confuse the controller, which would take out another drive if both drives are on the same controller. RAID can survive one drive out but not two!

If you need to have a CD or DVD on your RAID computer, and it is an IDE device, hook it up to one of the controllers on the motherboard. I have never yet gotten a CD or DVD to work with a PCI IDE controller. I haven't tried it recently, so maybe with the latest devices it might work. (I sure hope it works with SATA!)

As for NFS, I just read the NFS HOWTO:

http://www.ibiblio.org/pub/Linux/docs/HOWTO/other- formats/html_single/NFS-HOWTO.html

My Linux distribution had already set up the NFS server software, so I just needed to set up the config files.

steveha

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Re:Most common form of data loss? by penguin7of9 · 2003-10-29 19:57 · Score: 1

it could be smart enough to notice that only one block of the file changed, and share the remaining blocks with the old version.

That would then mean that the kernel needs to incorporate much or all of CVS (note that ID3 retagging may shift all the blocks).

And that's something you can only do efficiently with kernel help; if you want userspace libraries doing that kind of thing, you end up reimplementing most of the filesystem innards yourself.

Quite to the contrary: functionality for copy-on-write and for sharing ranges of bytes among files is generally useful. It is bad design to spend so much effort implementing that for versioning and then only make it available for that one complex and questionable function.

Instead, there should be a minimal API for composing files from shared blocks or even shared byte ranges in the kernel, together with support for copy-on-write functionality, but the versioning itself should be provided in user code.
Re:Most common form of data loss? by straybullets · 2003-10-29 20:55 · Score: 1

In my 4 years of system administration i've already seen many places where raid was not common, ie only for important data (they don't have enough money it seems ...) I've also seen blatantly out of date "backup" procedures. And i've seen IBM ssa disks diying at an awfull rate. And useless corrupted mirrors. So i'd say disk failure is quite an important data loss cause.

--
With that aggravating beauty, Lulu Walls.
Re:Most common form of data loss? by caluml · 2003-10-29 21:17 · Score: 1

Erm, sometimes a good idea could be lost in the noise. I just wanted to bring attention to it, and who knows - maybe someone that is looking for some extensions to a filesystem to write might be reading Slashdot, and think, "hmm, that's a good idea, and someone else thinks so too".
You just never know.

--
Get your own free personal location tracker
Re:Most common form of data loss? by TumbleCow · 2003-10-29 21:54 · Score: 1

That's why my preferred method for disk redundancy on my work PC is a 2nd drive and a nightly rsync. The "backup" takes about 5 minutes. And since it isn't RAID, when I make a mistake it isn't made on all drives at once. As long as I notice my goof before the backup runs at 5am, no problem.
I recommend reading Easy Automated Snapshot-Style Backups with Linux and Rsync . This article is about backing-up with rsync, and then using hardlinks to create snapshots which only use up space for changed files. this can be *very* convenient if you notice your goof just a day later, and even for diffing between a working file and a non-working one.
Re:Most common form of data loss? by ckaminski · 2003-10-30 01:45 · Score: 1

I'll be happy to run VMS the minute H-Paq gets it ported to my Duron. ;-)
Re:Most common form of data loss? by 1gor · 2003-10-30 01:48 · Score: 1

Accidental change/deletion of content - that is where version control may be put to a good use. Create CVS repository on your Linux server (preferably, one that has RAID). Install TortoiseCVS on your Win32 workstation. That's all. Adding docs to repository, updating, retreiving previous versions is as simple as right-clicking the mouse. It was unthinkable to recommend CVS use to a non-programmer, but TortoiseCVS have changed it.

I am using versionoing at home for all sorts of files. What really helps is to have one copy of each document. No need for draft versions etc. I am also using this free diff program to view changes in MS Word documents.

--
--
Re:Most common form of data loss? by ckaminski · 2003-10-30 01:53 · Score: 1

I don't necessarily want to do "binary diffs". Block diffs, maybe. Some enterprise level fileservers currently have this functionality, like EMC, HP, etc. Change a file, and a copy of the changed block is kept so that a single consistent backup can be made.

Scenario (for the uninitiated):

I want to backup c:\winnt and C:\program files, both directories VERY important to the proper functioning of NT/IE/Word, etc.

Most backup software will backup one directory, then the other, allowing for differences and inconsistencies to pop into play, say I make significant registry changes that breaks my system before C:\winnt can be backed up. If I then attempt a restore, I'll still get the broken changes, and will be screwed.

Now there's at least one filesystem/backup tool available that will:

Take snapshot of system at XX:XX PM. Every change made to the filesystem after this, store in a special area to be committed after the backup has finished. Similar to transaction logs with SQL servers. Why something like this couldn't be done easily at the block level on a filesystem confuses me.

It would be less useful for software that completely rewrites it's data file each time it's saved, but in that event, you'd still have an automaticly version controlled file, except it would be the ENTIRE thing. For some files I have, that's acceptable.

CVS is all well and good, but it's one more level of indirection that I need to manage, and for some things, it's one level too many. I want to set up a group collaboration share, and just have the documents version control themselves at the FS level. With some good management tools, I'd never have to be asked to restore yesterdays copy of a file because some tool destroyed it again.
Re:Most common form of data loss? by Gyorg_Lavode · 2003-10-30 04:42 · Score: 1

What distro are you using?

--
I do security
Re:Most common form of data loss? by steveha · 2003-10-30 06:40 · Score: 1

Debian GNU/Linux. I'm using the "unstable" branch on my desktops, and "stable" on my servers.

http://www.debian.org

steveha

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Re:Most common form of data loss? by merlin_jim · 2003-10-30 08:53 · Score: 1

Did I just read a comment to get to know how important you think you are?

No; someone questioned my use of a term. I pointed out where I got that term from, and gave a page number in a reference document from the world's biggest software vendor where one could find that term.

I don't give a fuck how important you think I am.

I give a fuck that you think I'd confuse these terms on my own. I didn't. My chosen software provider gave me a manual describing features of their new operating system (which has a much greater footprint than all of linux combined) and I chose to use the terminology in that manual to describe technology features.

--
I am disrespectful to dirt! Can you see that I am serious?!
Re:Most common form of data loss? by Asprin · 2003-11-03 01:58 · Score: 1

Amen, and seconded!

Anyone who claims that tape is irrelvant and insufficient in the year 2003 is either ignorant of the facts or trying to sell you a hot-site server mirroring solution

--
"Lawyers are for sucks."
- Doug McKenzie
Re:Most common form of data loss? by angst_ridden_hipster · 2003-11-04 11:22 · Score: 1

It's called the Duplicty project.

http://rdiff-backup.stanford.edu/duplicity.html

--
Eloi, Eloi, lema sabachtani?
www.fogbound.net

Intermezzo by mikeee · 2003-10-29 09:19 · Score: 5, Informative

Intermezzo is designed for this and a bit more - if one of the machines is a laptop you can take it away and work on it, and it'll resync when you get back.

It isn't particularly high-performance, from what I know, and may be more complexity than you need.

Network RAID by Anonymous Coward · 2003-10-29 09:20 · Score: 1, Interesting

Redhat has a very good software raid and is easy to setup with only two disks. Of course with only two disks they are mirrored. But it is very easy to setup a cron entry that can email you the status of that mirror everyday.

Bandwidth by omega9 · 2003-10-29 09:20 · Score: 3, Insightful

I hope you're looking at some fast lines to put between those boxen. Even at 100Mb/sec, doing RAID across a LAN could get slow.

--
I'm against picketing, but I don't know how to show it.

Re:Bandwidth by SirJaxalot · 2003-10-29 09:30 · Score: 1

don't even think about trying raid over modems either.

--

Geminatron
Re:Bandwidth by twitter · 2003-10-29 10:05 · Score: 2, Insightful

Bunk. If you can do raid over USB, you can do it over 10/100 ethernet. As long as it's just used for data storage, the loss in speed should be no big deal. Windows, at least, would not notice.

--
Friends don't help friends install M$ junk.
Re:Bandwidth by WindBourne · 2003-10-29 10:29 · Score: 1

I have been doing the nbd/md raid at home just for /home and it works great. Speed has been fine.

--
I prefer the "u" in honour as it seems to be missing these days.
Re:Bandwidth by catenos · 2003-10-29 15:28 · Score: 1

Bunk. If you can do raid over USB, you can do it over 10/100 ethernet. As long as it's just used for data storage, the loss in speed should be no big deal. Windows, at least, would not notice.

I really don't understand how this was modded "3, Insightful". It may be funny, or a troll, depending on intent. But insightful? Only if you didn't bother to read the linked article.

Because the linked article talkes about RAID'ed floppy drives over USB. Yes, for slow, obsolete floppy drives, USB is fast enough.

The claim that one wouldn't notice it using 10/100 ethernet just proves that the previous poster never compared a network file system with a local one for daily work.

Even with 100MBit, my current project compiles in just over 3 minutes (on a old 500Mhz CPU). I have linked the directories for the generated binary files to the local disk (max transfer about 13MB/s), because that speeds up compiling by a factor of 3 [1]. Now imagine using 10MBit and implied slow-down by a factor of about 10 and then reclaim that one wouldn't notice. With a current computer the difference would be even more staggering.

Ah, and if compiling does not fall into the "data storage" category: Well, simply copy that 50MB log file around, and some seconds become minutes (regarding the nobody would notice a "10 MBit" link).

[1] Considering the max transfer of the network file system (about 7MB/s) and the disk (13MB/s), I'd expected a smaller difference, especially because the OS may cache the files. Apparently latency plays a bigger role than I anticipated.

--
Keep an eye on which arguments are silently dropped in replies. Not always, but often times it's very telling.
Re:Bandwidth by jonadab · 2003-10-29 16:08 · Score: 1

100BaseT is plenty fast enough for many types of data -- notably, anything that
is normally loaded into RAM, worked with for a while, and maybe periodically
saved. Documents, images, source code... most of the kinds of things we tend
to accumulate large amounts of and want stored redundantly.

The only notable thing I can think of that most of us have that you want
stored redundantly that would not perform well over 100BaseT is email, and
that's because you will commonly want your software to access lots of files
(or, if each mail "folder" is a file for you, really *big* files) all at
once. Still, a network fileshare would perform at least as well as IMAP
over the same connection. There are other things besides mail that you
would probably not want to access over 100BaseT, but they're things most
of us don't use much, like video. (Video downloaded from the internet does
*not* count here, for obvious reasons. I'm talking about movie quality
video.) Also, you probably don't want your WAV music collection stored
this way. MIDI would be okay, but if you're into modern music and don't
like lossy compression, you'd want to work out another arrangement for that.
Like, let a computer that stores them play them, and run speaker wires to
various locations. (You could still store copies on more than one system,
but rather than accessing them over the network you'd use one of the PCs
that had the data locally as your music player.)

But for normal data, no problem. I routinely use CIFS over TCP/IP over
100BaseT to remotely access OpenOffice documents and .XCF images on the
Windows PC upstairs from my Linux PC here, and the delay is not generally
worth talking about. Loading the application (OO.o or Gimp or whatever)
from the local hard drive takes much longer (because the app is much larger
than a single document or image file, unless it's a fairly big image
or an abnormally *massive* document).

Obviously, you would take a noticeable performance hit if you accessed your
applications over 100BaseT. I've done it, though, and it's not entirely
horrible, for small-to-medium applications. Even for large applications,
it performs better than Knoppix. (100BaseT is faster than a CD-ROM drive.)
But generally you wouldn't need to do your apps this way, because you don't
need to worry about losing them; you can always install them again. (You
could put your downloads folder on the LAN, no problem. Apps that you have
on CD there's even less reason to store redundantly.) It's really not apps
but *data* that you want to store redundantly and share between systems.

Of course, Gigabit Ethernet is beginning to become affordable; for somebody
with eight computers in his house (all being used -- that seems to be what
he implied, so I assume these are not old 386s in the closet), it may even
be affordable now. If so, hey, by all means, go for it.

--
Cut that out, or I will ship you to Norilsk in a box.

RAID on Files by Great_Geek · 2003-10-29 09:20 · Score: 3, Insightful

I have often wanted the same thing, kind of like RAID on files, call it RARF (Redundant Array of Remote Files). I was thinking along the line of a device driver that presents an ATA/IDE interface to the file system on one side and passes the requests to multiple copies of virtual disks. The virtual disks would be like VMWare disks, and potentially each on a different machine/location. Each virtual disk could even be encrypted differently.

This would be really useful for SOHO type places to allow me to have a hot offsite backup at multiple friends (and vise versa).

Re:RAID on Files by ZenShadow · 2003-10-29 09:27 · Score: 1

What you describe is a combination of the loopback and md drivers under Linux -- RAID1 (or 5 or...) on loopback devices pointing at files living on NFS disks. Or something.

--ZS

--
-- sigs cause cancer.
Re:RAID on Files by babyrat · 2003-10-29 10:57 · Score: 1

kind of like RAID on files, call it RARF (Redundant Array of Remote Files).

Hmmm... I think I'd call it 'Back-up Array of Remote Files' (BARF) :)

DIBS? by kulpinator · 2003-10-29 09:20 · Score: 1

I haven't checked into it much, but I remembered the DIBS (Distributed Internet Backup System -- Slashdot article here). I would imagine that it could be modifed (maybe not trivially) to support real-time disk operations, since it is open-source. However, although I don't know much about Python, I have a feeling this may suffer in performance from being written in a (semi-)interpreted language. Python lovers want to flame me for incriminating their programming language?

--
Karma: Positive (mostly due to rash moderations)

Re:DIBS? by hubt · 2003-10-29 14:34 · Score: 1

I haven't tried DIBS, but I've been considering it lately.

I wouldn't assume that performance suffers just because it's written in Python. I don't hear people complain about BitTorrent even though it's written in python.

Back to the original question of what the best solution is. Obviously it depends on what the needs are. If you need total transparency and minimum downtime, then a RAID is great, but it isn't cheap. If you don't mind slow backups, and slow recovery times, and you already have extra disk in separate machines, then a network based backup is pretty cheap.

DIBS has some interesting features, including the ability to backup to N different places and incremental backups.
Re:DIBS? by jonadab · 2003-10-29 16:32 · Score: 1

> Python lovers want to flame me for incriminating their programming language?

I have no love for Python (tried it, didn't like it), but I think your concern
about performance is overrated. This mattered in the days of 486 CPUs, but
these days it's totally unimportant. interpreted and even VM-based languages
perform just fine on modern hardware -- and the code is more maintainable, so
there are *substantially* fewer critical bugs than with C code.

If you want to criticise Python, talk about significant whitespace, strong
typing (which the Python people even have the gall to claim is an advantage),
foisting a certain paradigm (OO) on every problem whether it fits or not, or
the sort of structural bondage and discipline previously associated with
Pascal. That ought to rile 'em up a bit. But don't talk about performance;
they'll ignore you because, while it's technically true, it doesn't matter.

--
Cut that out, or I will ship you to Norilsk in a box.

Backing up all within your house by Alain+Williams · 2003-10-29 09:20 · Score: 4, Insightful

Hmmmm, what happens if your house catches fire ?

8 copies of the same document all nicely toasted!

Re:Backing up all within your house by feepness · 2003-10-29 09:32 · Score: 2, Funny

Hmmmm, what happens if your house catches fire ?

Come on, this'll never happen. I live in San Diego!
Re:Backing up all within your house by peragrin · 2003-10-29 09:39 · Score: 1

Yes and No. your house would have to be totally gutted for that to happen, with an average 10 minute reponse time for fire dept. in the U.S.( longer if you live in a rural area, shorter in the cities) The proballity of losing all 8 systems is remote. Chances are at least 2 of the systems will survive.

Your chances are even better if you seperate the macines through out the house.

--
i thought once I was found, but it was only a dream.
Re:Backing up all within your house by BigDumbAnimal · 2003-10-29 09:39 · Score: 1

This is personal stuff. It's not like this guy has $20M in data that needs a redundant data center ready to go live within minutes on the other side of the world.
Re:Backing up all within your house by Lester67 · 2003-10-29 09:40 · Score: 1

With 8 systems you should have time to toss one of them out the window.

Followed by your porno mag collection, Star Trek video tapes, and cloth baggie full of multi-side die.
Re:Backing up all within your house by Eric+Smith · 2003-10-29 10:04 · Score: 2, Interesting

Hmmmm, what happens if your house catches fire? 8 copies of the same document all nicely toasted!
Been there, done that. :-( Didn't even get a t-shirt.
Re:Backing up all within your house by pyrrhonist · 2003-10-29 10:17 · Score: 1

with an average 10 minute reponse time for fire dept. in the U.S.( longer if you live in a rural area
Fire companies are rated on a scale by insurance companies to see whether an insurance company will insure your property against fire. When my parents moved into a new house, they had to find a different insurance company, because their original insurance company would not insure my parent's new house. Why? Because the local fire company had a very poor rating. As in, the response time was something on the order of hours as opposed to minutes.
This is why fire companies in rural areas are derogatively referred to as, "cellar savers" (this is unfair - they will still risk their lives to save you).
So basically, your chances of losing everything to fire are greater than you realize.

--
Show me on the doll where his noodly appendage touched you.
Re:Backing up all within your house by Anonymous Coward · 2003-10-29 10:28 · Score: 1, Insightful

Who gives a flying shit about Porn and Illegal MP3's when your house and all your shit is burnt. The garbage on my computers would be the least of my freaking problems if all my stuff is a smoldering pile of ash. Put your priorities in order dillhole.
Re:Backing up all within your house by TheWart · 2003-10-29 11:25 · Score: 1

I live in SoCal you insensitive clod!

Loose Hard Drive? by Anonymous Coward · 2003-10-29 09:21 · Score: 2, Funny

As opposed to a tight one?

Speed would be an issue... by Trolling4Dollars · 2003-10-29 09:21 · Score: 4, Informative

I imagine you'll need gigabit ethernet or multiple NICs in bonded mode. Then you have the performance of each individual system to take into account. Especially if one of the systems is heavily used. I would recommend getting one BIG HONKIN' SERVER and putting it in a central location. Give it gigbit and let everything else connect to it at 100. Then, make sure it has a hardware RAID controller. Use SAMBA for the cross platform connectivity you desire, and viola! protected data with redundancy and high speed performance. If you go with remote display (RDP with Windows Terminal Server or X with *nix) then you have an even better appraoch as all the data will exist on the secure RAID box.

I get what you mean though... it's a nice idea, but it would be costly to implement vs. what I suggested above.

When I went to see a presentation on HP's SAN solutions last year, I was very impressed with the ideas they had. One big hardware box with multiple disks that are controlled by the hardware. They are then presented to any systems over a fiber link as any number of drives you wish for any OS. Finally, their "snapshot" ability was pretty impressive. (Also called Business Copy) All they would do is quiesce the data bus, then create a bunch of pointers to the original data. As data is altered on the "copy" (just the pointers, not a real copy), the real data is then copied to the "copy" with changes put in place. I imagein something similar could be accomplished with CVS...

--
Un-news

Re:Speed would be an issue... by rossz · 2003-10-29 09:44 · Score: 1

Just remember to back up that big honkin' server with a reliable medium. Don't trust that steaming pile of shit from Seagate called Traven.

--
-- Will program for bandwidth
Re:Speed would be an issue... by quinto2000 · 2003-10-29 10:13 · Score: 1

I think you meant voila, not viola. A viola is a musical instrument.

--
Ceci n'est pas un post
Re:Speed would be an issue... by LookSharp · 2003-10-29 10:19 · Score: 2, Informative

...as much as I dislike replying to T4D, he brings up an interesting scenerio to counter your suggestion of using multiple machines.

I took a spare machine, added a 3ware 6800 ATA RAID controller ($130 on eBay), and installed eight 120GB Maxtor hard drives ($1200 when I bought them last year) and put them in eight Genica hot-swap trays ($60). For about $1500, I now have an 800GB formatted RAID5 array. (Had to throw in a dedicated 400W Antec power supply for HDs.) In a year, two of the drives have flunked, and the replacement drives have rebuilt beautifully.

If you're going to lose the site, you're going to lose your data in either case. All you protect against with the network situation is the complete loss of one machine. Protect your server as much as possible and put your data on it.

Just make sure you keep the "most precious" data offsite on tape of a sneaker-net external hard drive, in case the pop-tart that got stuck in your toaster burns down your house. (This apparently happens about 30 times a year, by the way, including one of my co-workers :)
Re:Speed would be an issue... by Trolling4Dollars · 2003-10-29 12:35 · Score: 1

You are quite right. Of course I probably should have said it like Jerry Lewis would have back in the day:

"Hey Laaaady!!!! Viola!!!!" ;P

--
Un-news

Coda by fmlug.org · 2003-10-29 09:21 · Score: 3, Redundant

Coda may do what your looking for

# disconnected operation for mobile computing
# is freely available under a liberal license
# high performance through client side persistent caching
# server replication
# security model for authentication, encryption and access control
# continued operation during partial network failures in server network
# network bandwith adaptation
# good scalability
# well defined semantics of sharing, even in the presence of network failures

More info here http://www.coda.cs.cmu.edu/

Re:Coda by quantum+bit · 2003-10-29 12:32 · Score: 2, Interesting

If by "high performance through client side persistent caching" you mean "has to copy the entire 300MB video from the server to my local machine before it even starts playing, assuming it doesn't crap out because the default cache size is smaller than that", then yeah, go for it!

Seriously, I looked into Coda a couple months ago and the design looks really cool, but it just doesn't seem to work very well unless you're only storing tiny text files. It also doesn't scale very well on large servers (i.e. it has a maximum limit on number the of files on each volume). Don't get me wrong, I REALLY wanted to use Coda because I liked the idea of it -- I just wish that it worked better. Ended up going back to NFS (yuck!).
Re:Coda by Umrick · 2003-10-29 13:22 · Score: 1

Yep. Been looking at both Coda and OpenAFS recently. Finally settled on OpenAFS. The strong points of CODA (disconnected operation) also tend to blow apart your consistency. Seems like one of the most common problems is partitioning due to RW replica volumes getting out of sync.

Other major issue to me was a limit on the number of files (due to metadata). Basically means that you need a server process running on the server for approximately every 25 gig you add to the CODA share.

OpenAFS didn't have RW replicas, or disconnected operation, but seems to scale to much larger storage sizes without shredding itself.

Distributed Network Block Device by JumboMessiah · 2003-10-29 09:22 · Score: 2, Informative

A perfect solution would be a form of network block device that mounts distributed NBD shares. The Linux DRBD Project has this capability. From their website, "You could see it as a network raid-1".

data loss by _fuzz_ · 2003-10-29 09:22 · Score: 1

...I could protect myself from the most common form on data failure - a disk crash.

In my experience, the most common form of data loss is not hardware failure, but user error. RAID is great for protecting against hardware failure, but be sure to still make backups to prevent against accidental deletion.

--
47% of all statistics are made up on the spot.

Re:aw geeze. by JohnnyKlunk · 2003-10-29 09:22 · Score: 1

Thats true, but raid == raid
it's different to having a 6 week offsite tape rotation strategy, but does protect you against a disk failure, which is what the original post wanted.
I backup my servers as work, I also raid them. To me, doing both makes perfect sense.

Re:aw geeze. by wallywam1 · 2003-10-29 09:22 · Score: 1

Nitpicky semantics != intelligence. Redundancy is a form of backup since it allows a way to recover data that would be lost if the redundancy were not in place.

Try Rsync or DRBD by oscarm · 2003-10-29 09:23 · Score: 4, Informative

see http://drbd.cubit.at/ DRBD is described as RAID1 over a network.

"Drbd takes over the data, writes it to the local disk and sends it to the other host. On the other host, it takes it to the disk there."

Rsync with a cron script would work too. I think there is a recipe in the linux hacks books to do something like what you are looking for: #292.

Venti needs a mention by DrSkwid · 2003-10-29 09:24 · Score: 3, Informative

http://plan9.bell-labs.com/sys/doc/venti/venti.h tm l

Abstract

This paper describes a network storage system, called Venti, intended for archival data. In this system, a unique hash of a block's contents acts as the block identifier for read and write operations. This approach enforces a write-once policy, preventing accidental or malicious destruction of data. In addition, duplicate copies of a block can be coalesced, reducing the consumption of storage and simplifying the implementation of clients. Venti is a building block for constructing a variety of storage applications such as logical backup, physical backup, and snapshot file systems.

--
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter

Expensive but reliable solution by onyxruby · 2003-10-29 09:25 · Score: 2, Interesting

I've been looking into something like this for a little while. What I'd like to do when I have the fundage is get a fileserver/backup box. The ideal is to run 4 160 GB IDE drives in RAID 5. This will give me a bit over 450 GB in usable network storage. I then want to add a pair of 250 GB 5400 drives for backup. I can then set up a the server to backup the data from the raid drives to the backup drives on a daily basis.

According to pricewatch the 4 160's could be had for around $400 total with about another $400 for the backup. Add a 3ware RAID controller for another $245 bucks and your looking at about $1045 to convert a system into supporting 450 GB of usuable network storage and backup.

From all indications IDE harddrives are now the cheapest form of backup there is. I've looked at CD, DVD, Tape, but it keeps coming back to IDE hard drives. This is far cheaper than a similiar storage and backup would be on tape.

Re:Expensive but reliable solution by Grant+Root · 2003-10-29 11:03 · Score: 1

Don't forget that there are several kinds of data loss, and your approach only protects against one of those (hard drive failure).

There's also the case where you delete something accidentally and discover it a couple of days or weeks later (after your disk backup is overwritten), and the case where all of your disks fail (a fire or flood).

Tape, or some other removeable medium that lets you make multiple copies and store some offsite, is a solution to all of those cases.
Re:Expensive but reliable solution by onyxruby · 2003-10-29 11:27 · Score: 1

I've used tape in the past, I've used DAT 20's, 40's, and rackmount jukeboxes and so on at work. I bought an Onstream SC30 drive before anybody even knew who Onstream was. I have used and supported backup software from the likes of Veritas at work for years. The key thing here being that these were all used at work. The poster wants something for home use.

Pricewatch has the DLTIV 40/80's around $40 each, and a drive for about $952. A seven pack of said tapes DLT IV tapes is $440. This doesn't take into account the SCSI card. It also doesn't take into account the fact that tapes wear out after about a year. Rough cost for this setup would be one set of seven tapes for $440 (you never get the full claimed compressed rate) for a full backup and probably another $440 for a set of differentials with a couple of spares. Figure another $150 for a SCSI card. All told your looking at just under $1800 to $2000 for the setup depending on how many tapes you want for incrementals. And this doesn't cover the storage factor, just the backup cost. For the cost differential your way ahead on the money even if you have an IDE drive die every year.

This also doesn't take into account the sheer time and babysit factor. With the IDE solution I gave you can have a script do your backup for you. You don't have to deal with it, which can get real old real quick when your not getting paid.

I've also had to send backup tapes for repair before - expensive because places like Ontrack have you by the balls and they know it. Without question, SCSI is always best, except for that whole cost thing. IDE RAID controllers have finally gotten real (not that Promise crap) and IDE drives have simply achieved a price point that for a small office / home use can't be ignored. Now if I'm back in the Enterprise environment we'll start talking about RAID arrays backed up by Storage Jukeboxes, but that isn't the environment at hand.
Re:Expensive but reliable solution by onyxruby · 2003-10-29 11:32 · Score: 1

Absolutely, RAID will very nicely delete everything you tell it to across the whole array. Just like you told it to, not that I've ever done that - ok once years ago - but I learned my lesson. This is why I mentioned the backup drives as well. Tape is nice for many things, read my response to another poster on this thread for why I didn't recommend it for this person's home use.

What I've been looking at more seriously if I can afford it are the removable hard disk trays with an extra set of backup disks. This way I could swap out a pair of trays and backups to offsite storage every couple of weeks or so. The nice thing about this option is that I can add this capability in the future when I can afford it.

hyper scsi by blaze-x · 2003-10-29 09:26 · Score: 2, Informative

from the website:

HyperSCSI is a networking protocol designed for the transmission of SCSI commands and data across a network. To put this in "ordinary" terms, it can allow one to connect to and use SCSI and SCSI-based devices (like IDE, USB, Fibre Channel) over a network as if it was directly attached locally.

http://nst.dsi.a-star.edu.sg/mcsa/hyperscsi/

iSCSI? by SuperBug · 2003-10-29 09:26 · Score: 1

You can share iSCSI devices, if you do it the right way, between many different hosts. NBD sounds good, but for what you're asking, iSCSI or FCIP or some derivative sounds more correct. i.e. virtual block devices, or "real" block devices on a network that can be accessed by windows or *nix. you could RAID (md) iSCSI devices, or just use a system which "owns" all the iSCSI devices in an MD, and present it up using CIFS or SMB.

--
--SuperBug

Re:iSCSI? by martin · 2003-10-29 21:58 · Score: 1

for iSCI to work don't you need a network switch that talks iSCSI so you're QoS required doesn't drop?

goes off to cisco web site to find out more..

Check this out... by BubbaTheBarbarian · 2003-10-29 09:28 · Score: 1

http://www6.tomshardware.com/storage/20031028/in dex.html
Not as a solution in and of itself, but it is a good idea considering that you more then likely have a box to burn...also try to grab some old PolyServe software. It will do that samething over a network, though not without resource loss.
WAR TUX!

Not really a good idea by c77m · 2003-10-29 09:29 · Score: 1

Maybe it would be a fun experiment, but there are too many potential issues for me to consider this a good solution. With a goal of decreasing your susceptibility to failure, you are introducing many more possible failure points. Instead of data relying on disk, bus, and cache, you're looking at the same times as many systems as you have, plus introducing your network as a failure point.

What about data integrity when the network fails? Or when a single host fails? You could create ACLs for hosts that would be responsible for certain data upon certain failures, but then you're adding to an already overwhelming management nightmare.

Why not consider a shared storage system? You're not realistically going to have a failproof plan in your home, so just narrow it down to a few things. External JBOD with software RAID, presented as NAS to the rest of your computers. If a drive fails, just replace it. If the NAS head fails, just hook up the JBOD to another host.

Re:Not really a good idea by Indianwells · 2003-10-29 09:32 · Score: 1

That's not really the point is it? This guy has a legit need here, to setup a shared backup system amongs a group of machines. The single point of failure here is a hard drive on one or many machines. This way he distributes his problem instead of not having a solution. I like tthis idea quit a bit. As this isn't for the enterprise, why would he even really care about access control? Sheesh.
Re:Not really a good idea by c77m · 2003-10-29 09:50 · Score: 1

Maybe it should be. Any implementation of a so-called distributed filesystem needs access control of some type. NFS uses locks. Sun Cluster uses a quorum device. Volume Logix uses an internal access database.
If you have two hosts that "share" a filesystem but can only communicate with each other over the network, what happens when the network switch fails? Both hosts are up, but they can't communicate with their peers. If either host writes data, your data integrity is lost. Enterprise or not, nobody likes corrupt data.
So perhaps it is the point...
Re:Not really a good idea by Indianwells · 2003-10-30 04:55 · Score: 1

I guess what I meant by access control would be fine grained access control. Of course one would need to restrict access to a machine on a network to known parties ... but rlogin and rsh worked just fine on trusted, local networks. A simple implementation along those lines, with hooks to plugin to a larger access framework would be useful. For me, the use would be in the distributed backup, not necessarily the shared drive. I have many disks ... some of which sit idle in a closet. If I could automagically backup my laptop when I come home at night and plugin to my lan, all going into a centralized server holding the backups of any other machine I used, that would be useful. A better idea, might be having a shared user account over a series of 'nix boxen. Anyway, just some thoughts.

NBD for Windows by backtick · 2003-10-29 09:30 · Score: 1, Redundant

http://www.vanheusden.com/Loose/nbdsrvr/

(I haven't used this, but it exists)

Rsync and Ssh by PureFiction · 2003-10-29 09:32 · Score: 4, Informative

This is the way I do it, and although a little clunky, it allows me to keep remote backups of certain directories one three different servers.

First, setup ssh to use pubkey authentication instead of interactive password. You can read the man pages for details but it basically boils down to running keygen on the trusted source:

ssh-keygen -b 2048 -t dsa -f ~/.ssh/identity

Then copy|append the newly created ~/.ssh/identity.pub to the remote hosts into their /home/user/.ssh/authorized_keys file.

Now you can run rsync with ssh as the transport (instead of rsh) by exporting:

export RSYNC_RSH=ssh or also passing --rsh=ssh on the command line.

So to sync directories you could use a find command to update regularly:

while true; do
find . -follow -cnewer .last-sync | grep '.' 1>/dev/null 2>/dev/null
if (( $? == 0 )) ; then
rsync -rz --delete . destination:/some/path/
touch .last-sync
fi
sleep 60
done

Obviously this is pretty hackish and could be improved. But the point is that with ssh and rsync you could do automatic mirroring of specific filesystems or directories to remote locations securely.

Re:Rsync and Ssh by Anonymous+Freak · 2003-10-29 10:42 · Score: 1

Thank you for the simple hack. I know that this is exactly what *I* have been looking for for awhile now. Although the original poster I think wants something that also works in Windows.

--
Another non-functioning site was "uncertainty.microsoft.com."
The purpose of that site was not known.
Re:Rsync and Ssh by werd+life · 2003-10-29 10:47 · Score: 1

not to be a jerk or anything, but you've got a pretty serious race condition there. if any files are changed while rsync is running, you'll miss them. not such a big deal, i guess, since eventually you'll pick them up. if would be safer to touch the file before any processing is done.

mv .last-sync .last-sync-cur
touch .last-sync
find . -follow -cnewer .last-sync-cur

etc., etc.

but still, rsync is a great idea.
Re:Rsync and Ssh by adamfranco · 2003-10-29 10:52 · Score: 4, Informative

Here is a nice page that explains how do do this. Even better, it shows how to do nice incremental backups using only slightly more space than the source (for the differing file versions). This makes for a pretty cheap and easy backup solution.

--
"When ideology and theology couple, their offspring are not always bad but they are always blind." -- Bill Moyers
Re:Rsync and Ssh by strudeau · 2003-10-29 10:58 · Score: 2, Informative

the original poster I think wants something that also works in Windows.

Rsync and ssh can work with Windows using Cygwin. See this document for example.

The holy grail by mcrbids · 2003-10-29 09:33 · Score: 1

What you seek is the holy grail of high-availability environments.

So far, I've not seen anything that exists that does what you are asking for. Several technologies come somewhat close.

What I've been hopeful of is the recent donations by Oracle for database clustering, but I haven't seen any decent fallout from that... yet.

For now, on my home-based work network, I have two network drives (both IDE 120 GB) and do nightly rsynch from one to the other.

(sigh)

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.

Unison? by Anonymous Coward · 2003-10-29 09:33 · Score: 1, Informative

Not yet seen reference to unison:

http://www.cis.upenn.edu/~bcpierce/unison/

They say: "Unison is a file-synchronization tool for Unix and Windows. (It also works on OSX to some extent, but it does not yet deal with 'resource forks' correctly; more information on OSX usage can be found on the unison-users mailing list archives.) It allows two replicas of a collection of files and directories to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other."

Speed by backtick · 2003-10-29 09:34 · Score: 4, Interesting

Using a pair of Intel EEPro 100's w/ trunking (using both links at the same time on one IP, works w/ a cisco switch), I've gotten over 100 Mb/sec of actual throughput (I think I hit 137 Mbit/sec, peak) out of a box using NBD to create a mirror'd RAID volume over the trunked ports. Now, my actual 'real' data speeds to the file ssystem were about half that (Call it 50-65 Mbit, or 6 to 7.5 MByte/sec), due to mirroring == writing it twice. Still not bad. Yes, the target disks were themselves part of other RAID volumes, for speed :)

Re:Speed by ZenShadow · 2003-10-29 09:45 · Score: 1

Hmmmm.... How does it perform with small operations?

--ZS

--
-- sigs cause cancer.

You aren't gonna get a real RAID. by PurpleFloyd · 2003-10-29 09:34 · Score: 5, Insightful

First off, you aren't going to be able to use this like a real RAID array (a drive can die and you keep on working). The latency and bandwidth of any network that could be reasonably implemented in your home is going to prevent your system from acting like a real RAID array.

Instead of trying to implement a shoestring SAN, go the simple route: throw up a Linux box running Samba for your "backup server;" it doesn't need much horsepower, just fairly fast drives and a network connection. Then schedule copies of your documents and home directories (using a cron-type tool on Linux and XCOPY called by the Task Scheduler on Windows, you should be able to hack something together that copies only changed files) every night at midnight, or some other time when you aren't using your computers. Although you might lose a bit of work if the system goes down, you won't ever lose more than 24 hours' worth.

If you have more money to blow, then I would suggest that you invest in an honest-to-dog hardware RAID card and some good drives and put them into a server, then do everything across the network (put the /home tree and My Documents folders on the server). You can of course mount the /home directory in Linux via NFS or smbmount, and Group Policy in Windows 2K/XP will allow you to change the location of the My Documents folder to whatever you choose. You might be able to do the same via the System Policy Editor on 9x; it's been a while and I can't find the information after a brief Google.

To sum up:

Don't blow millions on a SAN for your house.
Cheap route: cron jobs/Windows task scheduler to copy important folders across the network every night
More expensive route: invest in a server with real RAID, then mount your important directories from that.

--

That's it. I'm no longer part of Team Sanity.

Re:You aren't gonna get a real RAID. by Lester67 · 2003-10-29 09:37 · Score: 1

Windows "Robocopy" will automatically check and compare file dates for you.
Re:You aren't gonna get a real RAID. by Cranston+Snord · 2003-10-29 09:46 · Score: 4, Informative

Instead of xcopy, try RoboCopy, included in the windows NT/2k/xp/2k3 resource kit available here. It gives you almost as much control as rsync, including directory synchronization, touch control, ageing, network failure support, and others. I use this at work to move around copies of live production data to backup servers located offsite via vpn without any issues. More information on syntax can be found here.

--
And now for something completely different...a man with three buttocks.
Re:You aren't gonna get a real RAID. by steveha · 2003-10-29 10:46 · Score: 2, Informative

No need for an "honest-to-dog hardware RAID". Linux software RAID is simply great.

Set up a server with multiple hard disks in a Linux software RAID, and run Samba and NFS on that. The Linux software RAID HOWTO explains all you need to know.

steveha

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Re:You aren't gonna get a real RAID. by dbarclay10 · 2003-10-29 11:00 · Score: 2, Informative

First off, you aren't going to be able to use this like a real RAID array (a drive can die and you keep on working). The latency and bandwidth of any network that could be reasonably implemented in your home is going to prevent your system from acting like a real RAID array.

I'm currently running some benchmarks on an XFS filesystem built upon a Linux MD RAID1 array, which is in turn built upon a local disk and a remote disk (which is at the end of a switched 100mbit network, the NBD server itself having an 8-year-old drive and a controller which doesn't do DMA).

[ dbharris@willow: ~/ ]$ cat /proc/mdstat
Personalities : [raid0] [raid1]
md1 : active raid1 nbd0[1] dm-5[0]
1888192 blocks [2/2] [UU]

It takes approximately 10 minutes for a 1.8G array to sync. That's respectable. It's not blazing fast, but it's respectable.

The bonnie++ scores are:

willow,1G,5086,31,4766,2,2873,1,6377,27,8655,2,1 58.7,1,16,878,18,+++++,+++,766,14,880,18,+++++,+++ ,595,13

Which isn't amazing, but quite respectable, especially given that this type of thing wouldn't be used for mass storage of ISOs or whatever, but used for people's "My Documents" folders and their $HOMEs. Notable that a fully local array I have which is made up with an old SCSI controller and some old SCSI disks is about half this speed as far as the filesystem goes, and about a tenth the speed as far as syncing goes.

So, I believe that your assertion of "you aren't going to be able to use this like a real RAID array" is quite incorrect. Especially given that my network isn't particularily fast, my NICs aren't particularily fast, and the remote disk I'm using is dog slow. Replace the NICs with parts that aren't pieces of crap, use Gig-E, and use controllers/drives that aren't 7-8 years old, and you'll get very respectable performance - ESPECIALLY given that the intention isn't to store everything on it, just people's individual files.

P.S.: Yes. I'm repeating myself. I know this. It's deliberate :)

--

Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Re:You aren't gonna get a real RAID. by dbarclay10 · 2003-10-29 11:42 · Score: 1

For reference, ext2 blows away XFS on this particular setup. The results for an ext2 filesystem on the same RAID array are: willow,1G,5572,31,5491,2,3754,1,6313,26,7994,1,191 .0,1,16,2261,94,+++++,+++,+++++,+++,2245,95,+++++, +++,4296,96

Oh, and you can use bon_csv2txt to conver that into a more readable form ;) Just 'echo willow,1G,... | bon_csv2txt'. bon_csv2txt is included in the bonnie++ package.

--

Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Re:You aren't gonna get a real RAID. by darrylo · 2003-10-29 12:15 · Score: 2, Interesting

Cheap route: cron jobs/Windows task scheduler to copy important folders across the network every night

Also, for those people concerned about leaving another "backup server" running 24x7, you can make use of the "wake on LAN" capability to do backups (available on many LAN/motherboards). Just wake up (boot) the "backup server", do your backup, and then shut it down. It's way cool to remote-boot home servers.
Here, the only real issue is the power/thermal cycling of the hard disk once a day (or whatever), which might be a problem since many disks now tend to come with only a one-year warranty. However, this isn't all that different from a regularly-used PC.
Re:You aren't gonna get a real RAID. by PurpleFloyd · 2003-10-29 12:33 · Score: 1

Based upon the question posted, I was imagining something along the lines of RAID 5 implemented across the network with multiple computers; complete with hot swap/hot spare capability, array rebuilding on-the-fly, and other "real RAID" features. Needless to say, this system would need to sync constantly for error-checking/correction and would require input from at least 2 systems before it could act on any data.
Basic software drive mirroring is fine (and something I've looked into for my own home network), but there simply isn't enough bandwidth or a low enough latency for a RAID 5 system to work across consumer-level networking technology (even if you count Gigabit Ethernet as "consumer-level").

--

That's it. I'm no longer part of Team Sanity.
Re:You aren't gonna get a real RAID. by PurpleFloyd · 2003-10-29 12:46 · Score: 1

Linux's software RAID is, indeed, excellent. However, I like to run RAID 5 when I can (because it offers a good balance between safety and performance), and software RAID 5 is simply not an option on an older system - any time you need to read or write anything, watch your CPU usage spike. In fact, I have played around with software RAID 5; on an old dual Pentium system, it was actually slower than RAID 1.
My solution? I went on Ebay and bought two identical SCSI RAID cards (they're pulls from old Compaq servers; model number SMART-2P). I get good performance and Linux support, and I have enough equipment to survive a loss of any one component (RAID card, drive, or even cable). While software RAID is a good solution in some cases, hardware RAID cards are a good option if you can find them cheap (again, look on Ebay, but be sure that you have two that are set up identically, so if you lose one you can be up and running with the other, and then burn those suckers in).

--

That's it. I'm no longer part of Team Sanity.
Re:You aren't gonna get a real RAID. by steveha · 2003-10-29 13:05 · Score: 1

software RAID 5 is simply not an option on an older system

It might be, if the older system is set up as a server, and you don't care how busy its CPU gets as long as it can keep up with a 100 Mbit Ethernet connection.

But certainly if you can score some affordable hardware RAID, there is no reason not to use it!

steveha

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Re:You aren't gonna get a real RAID. by Cyno · 2003-10-29 13:24 · Score: 1

If you have more money to blow, then I would suggest that you invest in an honest-to-dog hardware RAID card and some good drives and put them into a server, then do everything across the network (put the /home tree and My Documents folders on the server).

Why? You're not servicing an office of users, just yourself. All you need is cheap and slow (I'm using 5400 RPM) IDE drives. Software RAID 5 is more than fast enough, even over cheap $30 100Mb switches, to serve 4 Star Trek TNG episodes to my screen, at the same time. That's some funny shit, try watching 4 shows at the same time, hearing their dramatic peaks goin off, seeing the various characters in different windows. It cracks me up! Anyway, there's MASS bandwidth available dirt cheap and almost noone knows how to make use of it.

I suggest you start small. Get a cheap PC for your server with a good large case running at least 1 Ghz with a single 120+ GB harddrive. Configure Samba and your network software how you like it. Then add in 4 IDE disks and configure a Linux software RAID when you run out of disk space or want the redundancy. Swappable IDE drive bays like InClose designs are very good for backups, too.

Once you have it setup if ANYTHING goes wrong, your data is still safe. You can easily move those drives to any other Linux system and it will probably autodetect the RAID and rebuild it, assuming you have at least 3 working disks. I replaced the motherboard of my system without a reinstall and everything mounted perfectly. It was serving my network before I knew it had even booted successfully.

Linux rocks!

My point is don't spend money on quality hardware. Use Linux and plan for failure. That way you can save money for more hardware later. My 550GB file server is completely full, so now it looks like it'll cost me another $1000 in disks to upgrade it again. But the network transparency of my data is so worth it, to me. I don't have a DVD collection or a CD collection. I got a fileserver in my closet!
Re:You aren't gonna get a real RAID. by dbarclay10 · 2003-10-29 13:38 · Score: 1

The "real RAID" features you mentioned happen slowly when they're on local disks too - they're hardly realtime. On a big array, expect adding a new disk to take a couple of hours.

Since the timeframe is already in "hours", "2 hours" verses "12 hours" isn't that big a difference in practice. In both cases, it's a fire-and-forget operation.

As for "would need to sync constantly for error-checking/correction" - why would it need to sync constantly? More often than having the disks in one case, that's for sure, since random machines will be shut down every now and then, which simply doesn't happen in a traditional RAID array. But otherwise ... don't see where it would be happening.

I used an old SCSI RAID enclosure at my second-last jobsite which didn't like newer drives - the ones which went into standby after <x> seconds of inactivity. Whenever they did that, the controller would mark the spindle as dirty, and it would need to be resynced when it (immediately) got woken up. Now, this was terrible on the drives - that's a *lot* of churning. So we ended up replacing the enclosure. But for a month and a half, even though random spindles were getting resynced every six hours, it was fine. And it was pretty slow too. 8MB/s was pushing it.

Really, all that matters is that a) the performance of the fully-synced array is up to par, and b) machines aren't popping up and down so often that you've got more than one or two resyncs going on at once.

As for RAID5, I do agree that it's probably not the best solution in this case. I recommended 1+0 in a different post. If he desperately wants to use RAID5, then something which does replication between the individual nodes instead of through a central server is definetly in order.

(P.S.: I'd definetly say "Gig-E is consumer hardware" :) It's got major advantages over 100mbit [namely better speed, of course, but more important for me is longer cable lengths], and it's about the same price.)

--

Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Re:You aren't gonna get a real RAID. by MrWa · 2003-10-29 13:40 · Score: 1

net use g: "\\some server"
cd \My Documents
start xcopy * g: /E /V /M /Y /H /R

This only copies what has changed and then sets the archive toggle to false so future backups will leave it alone. This gets set to true automatically by Windows whenever a file is changed. Not a very elegant solution (takes up a lot of space, and sometimes a lot of bandwidth) but it gets the job done (I always have up to the day backups.)

You probably don't want to do this. by NerveGas · 2003-10-29 09:35 · Score: 3, Insightful

Really. If you're on a 100-megabit LAN, that gives you a max of about 10 megaBYTES per second. So, if you have to transmit information to two other computers for every disk write, you're effectively limitting yourself to a maximum of about 5 megabytes/second disk transfer. And that's under GOOD situations. If you're doing random I/O, where the latency will be the determining factor, then take the latency of the hard drives, add in the latency of the networking, and the latency of the software layers, and you're looking at some pretty abysmal performance.

Using rsync in a cron job will solve your backup problems. In fact, your script can use rsync to do the synchronization, and tar/gzip to archive the backup - giving you "point in time" snapshots for when someone says "I deleted this file 4 days ago, can you get it back?"

steve

--
Oh, you're not stuck, you're just unable to let go of the onion rings.

Re:You probably don't want to do this. by pe1chl · 2003-10-29 10:55 · Score: 1

Who said the writes would have to be synchronous?
There may be a maximum throughput, but that does not need to slow down anything (except maybe a system shutdown shortly after a big write)...
Re:You probably don't want to do this. by NerveGas · 2003-10-29 11:23 · Score: 1

so, you make the writes asynchronous. Reads are still going to be terribly slow. You can get around that somewhat by using large amounts of memory for cache, but the first reads will still be pretty slow.

I'd much rather use rsync and tar/gz. In fact, that's what I did.

steve

--
Oh, you're not stuck, you're just unable to let go of the onion rings.
Re:You probably don't want to do this. by Znork · 2003-10-29 23:34 · Score: 1

"So, if you have to transmit information to two other computers for every disk write, you're effectively limitting yourself to a maximum of about 5 megabytes/second disk transfer."

That's just an implementation issue. I'd suggest using multicast.

I rather agree that there are better ways to solve the issues around distributed data storage than to write all servers at once tho. Performance isnt the biggest issue, integrity and syncronization would be worse.
Re:You probably don't want to do this. by jelle · 2003-10-30 04:09 · Score: 1

"Reads are still going to be terribly slow."

Since you have the data once on the local disk and once on the remote disk, the reads will not be slower than they would be without the remote mirror.

Note that the main reason to use this is for backup and high availability through failover.

--
--- Hindsight is 20/20, but walking backwards is not the answer.

I can't believe... by wcdw · 2003-10-29 09:36 · Score: 2, Interesting

...this question even got asked. Ok, if you *need* to share the same device across machine, something like the network block device can be a real help.

If all you're worried about is disk failures, mirror each disk locally. Disks are cheap, and real operating systems don't have any trouble with software mirroring.

Why would you want to make all of your machines suddenly non-functional, just because one of them lost a network card? Or the switch failed? Or ....

--
If you're not living on the edge, you're just taking up space!

while it's a cool idea by flaming-opus · 2003-10-29 09:36 · Score: 1

what you're proposing is probably a poor solution to your needs. To use RAID-like disk storage across the network will require several high-latency transfers across the network for every write opperation. -very slow.

Furthermore, every time one of the computers is powered off the system will wait for that machine to come back, or will treat it like a dead disk. Even with high performance raid devices, degraded mode is mighty slow. Then when the device comes back you will have to rebuild the raid. A long/slow/agonizing process even with fast hardware.

I think rsync in a cron tab is a much better idea.

Availability by raphae1 · 2003-10-29 09:38 · Score: 1

That also means that whenever even one of the machines is down ('hw maintenance', new kernel boot, system crash, unplugged...) all the others will lose access to the data too.
I suppose it could work well in a server room, but if your home setup is anything like mine - open cases and cat5 crisscrossing the house - or you have a screwdriver on your desk, you might experience a lot of downtime...
My wife would have me by the curlies.

yes, I'm a soldering iron wielding programmer

Umm, but what about? by mschuyler · 2003-10-29 09:38 · Score: 1

I hate to point this out, but my daughter's house in Scripp's Ranch in San Diego just narrowly escaped completely burning down. She evacuated with her hard disk (smart thinking there, kid!). The place is uninhabitable with smoke damage. How the fire went around that cul de sac is just amazing.

The point is: 8 computers in the house won't help diddly in a real disaster. That's a lot of work just to see it burn up. (I know it will never happen to you; it was 2,000 other houses that burned to the foundation.

And further, I've had two RAID systems go TU in the last few years. For me RAID doesn't cut it at all. Distributed File System works pretty cool--but so does a fire safe.

--
How about a moderation of -1 pedantic.

P2P solutions: Freenet, Oceanstore? by 3Suns · 2003-10-29 09:39 · Score: 1

Intermezzo and Coda both do this, but I don't think there's any windows versions available. There are some Microsoft things available too, but obviously those aren't for linux. NBD (which everyone else has mentioned) isn't distributed, so that's not really what you're looking for.

What you might be able to do is put together a microcosm of Freenet or something like it, running on just your home computers. There may be other Peer-to-Peer solutions available that are faster/more stable. Do some searching on peer-to-peer distributed storage networks. I know of two researchy ones: OceanStore and Chord. Good luck!

--

-3Suns

~~~~
The Revolution will be Slashdotted

Been meaning to do someting like this with unison by balamw · 2003-10-29 09:41 · Score: 1

Though not real time like a true RAID, I think what you're really after is something like rsync, as many other posters have mentioned. When this came up in an earlier story I found a like to Unison, which seems to be better for my needs at least.

http://www.cis.upenn.edu/~bcpierce/unison/

Might be interesting to combine this with FSRaid (Parity Archive or PAR files) to get some extra redundancy.

B

create a P2P network? by net_bh · 2003-10-29 09:42 · Score: 1

How about creating a P2P network on all your machines and programming them to download everything *new* on other machines?

That way every machine will have a copy of all the files!

Wasteful, yes! But simple and effective!

--
There is no patch for stupidity

Visit my blog

I do this.... by CSG_SurferDude · 2003-10-29 09:42 · Score: 3, Funny

I do this everynight to thousands of machines...

The software I use is Kazaa-lite.

Oh, you mean files other than my MP3s/jpegs/mpegs? Sorry, I can't help you there.

--

LongTail SSH Brute Force analysis tool is here!

Re:I do this.... by Cyno · 2003-10-29 13:01 · Score: 2, Funny

See, Kazaa is a perfectly legitimate technology, if only the RIAA and MPAA could stop polluting it with their copyrighted commercial garbage.

I blame Jack Valenti for this whole mess.
Re:I do this.... by joel.br · 2003-10-29 13:26 · Score: 1

If it something important just rename your files to something like sexyteen.mpg and host on kazaa. Keep a copy of the porn file to real file mapping table and voila. Worldwide distributed file system!

Off the mark by Salamander · 2003-10-29 09:49 · Score: 1

Many responses, even highly-rated ones, seem to be talking about simple replication via NBD (worst-written code I've ever seen) or DRBD. That's not the same as what the original poster was asking about. Neither are fully-distributed but non-transparent file stores such as HiveCache. AFS/DFS/Coda/Intermezzo are probably the closest in the sense of being both transparent and resistant to failures. There have also been a couple of very closely related projects at Microsoft (Farsite and Pastiche) but I'm not sure if there's anything you can actually download and use.

--
Slashdot - News for Herds. Stuff that Splatters.

Re:Off the mark by Frisky070802 · 2003-10-29 12:45 · Score: 1

Pastiche isn't from Microsoft, it's from the University of Michigan. Here's the link.

--
Mencken had it right. So glad that's old news.
Re:Off the mark by Salamander · 2003-10-30 09:49 · Score: 1

I stand corrected, and my apologies to the Pastiche folks. I should have remembered that Pastiche is not from Microsoft even though it's based on Pastry which is.

--
Slashdot - News for Herds. Stuff that Splatters.

Parallel Virtual File System by richoid · 2003-10-29 09:51 · Score: 4, Informative

http://www.parl.clemson.edu/pvfs/

"The goal of the Parallel Virtual File System (PVFS) Project is to explore the design, implementation, and uses of parallel I/O. PVFS serves as both a platform for parallel I/O research as well as a production file system for the cluster computing community. PVFS is currently targeted at clusters of workstations, or Beowulfs."

"In order to provide high-performance access to data stored on the file system by many clients, PVFS spreads data out across multiple cluster nodes, which we call I/O nodes. By spreading data across multiple I/O nodes, applications have multiple paths to data through the network and multiple disks on which data is stored. This eliminates single bottlenecks in the I/O path and thus increases the total potential bandwidth for multiple clients, or aggregate bandwidth."

Or there are many others to chose from, google for clustered filesystems:

http://www.yolinux.com/TUTORIALS/LinuxClustersAn dF ileSystems.html

Slow? by cerebralsugar · 2003-10-29 09:54 · Score: 2, Informative

I certainly would attest that this is a cool idea. I have a few systems at my place and it would be neat to make a single filesystem spanning all the storage on the network.

However, while small files would be fine, I would think the speed of the network would make for some fairly slow storage on a 100mbit network.

Add more users saving files across the network to the equation and things would get out of hand fast.

I guess I would just buy a serial ata raid motherboard (the intel D865GBFLK is one I have been thinking about), and just do 1:1 mirroring. Cheaper than scsi, and pretty darn fast.

--
Easy guys, I put my pants on one leg at a time. The difference is after I put on my pants I make gold records!

Raid != Backup by Alan · 2003-10-29 09:55 · Score: 2, Informative

Don't forget that RAID only protects you from hardware failures, it doesn't prevent you from doing an "rm -rf important_file" :)

Personally I have a server with a RAID 5 array that is shared via SAMBA to windows and linux clients, which works fine, though I may adjust this if good suggestions are made here. The only real issue would be disk space, and all my computers now have 120G+ hard drives or RAID array....

here ya go by acidrain69 · 2003-10-29 09:56 · Score: 1

ghettobackup.bat
copy c:\porncollection\*.* \\backup1\bak
copy c:\porncollection\*.* \\backup2\bak
.
.
.
copy c:\porncollection\*.* \\backup8\bak

--
-- Having a Creationist Museum is like having an Atheist place of worship

Wow... by badboy_tw2002 · 2003-10-29 09:57 · Score: 1

You really don't want to lose all that porn, huh?

New kind of network file system needed by rar · 2003-10-29 09:58 · Score: 2, Interesting

I don't think the RAID algorithm is the right way to syncronize all your data, when applied on the larger scale. I imagine that what a person really want to do is to unify all his accounts, on slow and fast links all over the world, to look like a huge syncronized partition which stores the data throughout the accounts with sufficient redundancy (meaning something like 'keep copies of all data on at least three different locations). I think using RAID for this would give horrible performance and not be nearly flexible enough in how data is distributed through the different locations.

A new networked file system is needed. I am working on such a solution on my spare time (but it is still in the design phase).

The main idea is to unify cache and storage. This means that the least used files are deleted when an account is running out of storage, but under the constraint that a mimum number of copies of the files are kept online. (Hence, data will propagate to the nodes that actually use it). Upon a data request the filesystem goes out and fetch the data. Preferably in some P2P-like way where it is fetched simultaniously from all locations that has copies of that data.

If someone knows a solution that already works something like this, please tell me.

--
Open Materials Database

Re:New kind of network file system needed by newshooze · 2003-10-29 12:56 · Score: 1, Insightful

I think you just described the Freenet project

Or try Groove workspace for Windows by AllDigital · 2003-10-29 09:58 · Score: 2, Informative

Groove workspace if a collaborative environment, but it does have a component that allows you to share an archive of files.

Worth considering because:
- Files are encrypted and sent in an encrypted format.
- Files placed in the shared space are mirrored on all systems that are members of the worspace.
- The software is free for non-commercial use.
- Lot's of other interesting features to play with.
- You can even mirror with a machine accross the Internet.

Limited by:
- The speed of your connection.
- Windows users only.

Go check it out at http://groove.net/

Does anyone know if there are efforts in the open source community similar to...or designed to enhance this product?

DRBD does it as well... by Ron+Harwood · 2003-10-29 09:58 · Score: 2, Informative

Obvious link.

--
BlackNova Traders

The obvious solution by swagr · 2003-10-29 10:01 · Score: 3, Funny

is to use IP over Carrier Pigeon.

Then the only remaining issue is number of pigeons.

--

-... --- .-. . -.. ..--..

Re:The obvious solution by elbrecht · 2003-10-29 10:42 · Score: 1

As that standard was back then implemented by a Norwegian Linux user group starring Alan Cox I must remind you to buy proper number of licenses from the SCO company!

How about RAID :^) by rdeadman · 2003-10-29 10:08 · Score: 1

Seriously, if you are worried more about a hard disk crash than a machine blowing up, why not set up a linux box with a RAID drive and use Samba to make it viewable on the network. You need to buy one more disk drive and it helps to have a second IDE controller, but it will save you tons of time and money compared with some distributed RAID setup.

At least, that's what I do.

rsync by chunkwhite86 · 2003-10-29 10:14 · Score: 1

Can't you just use rsync?

--
I'd rather be a conservative nutjob than a liberal with no nuts and no job.

Obligatory monty python reference by KiwiEngineer · 2003-10-29 10:15 · Score: 1

Alternately you could engrave the data onto coconuts and use migratory swallows transport them. But then that would raise the matter of using an African swallow compared to a European swallow.

--
Nobody expects the Spanish Inquisition!!

Re:Obligatory monty python reference by Nucleon500 · 2003-10-29 12:47 · Score: 1

Everybody knows that the african swallow is fastest, with the cheetah and the llama bringing up the rear. It uses a lot of CPU, though.

--
Litigious bastards
Re:Obligatory monty python reference by jeabus · 2003-10-29 14:02 · Score: 1

Of course, African swallows are non-migratory.

--
Save me Jeabus!

Amanda by ddkilzer · 2003-10-29 10:15 · Score: 1

Don't play around with something "cool" like a distributed RAID disk. Just spend the money on a decent tape drive and tapes, design a tape backup rotation strategy, get a safety deposit box at a local (or not-so-local) bank for off-site storage, and set up Amanda to do the backups.

I don't worry... by sirgoran · 2003-10-29 10:23 · Score: 1

too much about fire.

It's my wife and her need to open any email she gets using outlook on her windows box. She's just enough of a geek to be dangerous and "enjoys" the preview feature.

And she wonders why her 'puter can't log into the LAN without being Virus checked first.

-Goran

--
Carpe Scrotum - The only way to deal with your competition.

And now... Community Raid via Wireless Lan!! by Psyire · 2003-10-29 10:27 · Score: 1

woo hoo!!

Or you could backup to an online backup service. by AceXsmurF · 2003-10-29 10:28 · Score: 1

Warning, I am a support guy for FirstBackup an online backup service, www.firstbackup.com

If data protection\security at a cheap price is what you need most online backup services will fit the bill.

I mean, at where I work all of our stuff is 448bit encrypted before it goes on the wire, and then when it goes on the wire it goes to our server farm, then gets mirrored 30 miles away in a secure location. And, you can tell how many problems we have with the software, I am one of the support people and I am posting on slashdot I am so busy. :) The only downside to our software that I can see is that there is no linux/mac client as of yet, but mapped network drives work great.

Here are just a couple of our bigger competitors links and ours in case you are interested, I really do think online backup is what everyone will eventually go towards.

http://www.atbackup.com
http://www.connected.co m
http://www.firstbackup.com

Why not use Freenet? by La+Camiseta · 2003-10-29 10:28 · Score: 2, Insightful

It seems to be a great problem solver for what you're trying to do. First off, on initial start it only connects to computers it knows, or downloads info about a couple of nodes from the main website, but if you were to export your noderef and import it into all of your other systems instead of the default noderefs, then you could have a distributed storage network set up among all of your computers.

Granted, you'd have to have a bit more storage dedicated than you'll be storing, but if you want every file to have a decent backup, then that's one of the prices you'll have to pay. Also, it's self cleaning when it comes to backups, because it automatically pushes out the old, less requested files in favor of the newer, more requested files.

Another solution, should your systems be using Linux is maybe something like GNUnet, which is built upon the sharing of files in both a distributed and an anonymous manner.

Re:AFS (damn) by pHDNgell · 2003-10-29 10:28 · Score: 1

Damn, I *almost* hit preview. :) Oh well. Sorry about that.

--
-- The world is watching America, and America is watching TV.

Re:I used to do this, years ago.. by caluml · 2003-10-29 10:33 · Score: 2, Insightful

Listen, Sonny Jim. You'll not be getting any mod points from us by bringing up the last contender to Windows, which failed miserably. We're feeling good about ourselves right now, and we don't need bringing down.

--
Get your own free personal location tracker

Yes. by Ayanami+Rei · 2003-10-29 10:38 · Score: 2, Informative

Software RAID/LVM can detect which volumes go where by magic numbers written to them when you format them. But you still have to set up all the remote NBDs correctly on a new machine, and you need the old setup file from the old machine that tells it what block devices/partitions to use.

NOTE!

You shouldn't leave any NBD-exported volumes on the new master. Make it into a physical, local volume, but reference it in the "same place" in your RAID configuration.

--
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON

Is speed a factor? by adrianbaugh · 2003-10-29 10:38 · Score: 2, Insightful

I take it you've thought about speed issues? RAID over a 100mbit link doesn't sound like great fun - leastways I wouldn't put my swap on such a drive :) Gigabit might work though.

--
"'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
- JRR Tolkien.

Whats wrong with disk level RAID? by misleb · 2003-10-29 10:38 · Score: 1

IDE RAID controllers are pretty cheap and so are disks. I know having distributed data on the LAN sounds cool, but is it really worth the trouble?

-matthew

--
"THERE IS NO JUSTICE, THERE IS ONLY ME." -Death

Re:Or you could backup to an online backup service by eaddict · 2003-10-29 10:42 · Score: 1

I looked at your pricing. Since I have nearly 7GB of just digital photos BESIDES my data it could get quite pricey.

--
"If you are on fire you can just stop, drop, and roll. If you fall into Lava you are just dead." - my 5yr old daughter

A case for distributed LAN storage by Luminary+Crush · 2003-10-29 10:44 · Score: 2, Insightful

I understand many of the comments here which say "put in a big honkin' server and hardware RAID". That would be a better solution from a purely 'let's serve files and protect data' standpoint if you can accomodate a single, large server and want the best performance.

However, I see a use for a network LAN storage system. Every machine these days comes with a 72G drive or larger installed locally, yet we are trained as IT personnel to say 'don't store anything locally, it's not secure or safe, put it on one of our nice big honkin' servers'. Unfortunately, those big servers cost alot of money, often require specific admins (eg SAN experts to deal with the management software, dividing up LUNs, etc), and may involve alot of red tape to justify additional storage allocation for your project.

What to do with all that local disk space that, if unused as most centralized IT would rather have you do it, would be a vast untapped storage resource?

The concerns regarding latency are well understood, but this might not be a factor if this LAN storage array was used for 'archive' storage where real-time high speed access isn't the driving factor. A RAID 5 system would be far too fragile, as if two nodes were offline/rebooting the entire network storage LAN would be unavailable. You'd need to have more redundancy than that.

I could see an interesting application using multiple nodes each contributing disk space to a LAN archive storage array which would be 'written to' and retrieved with similar expectations as writing to a tape drive. The bonus would be that you could work on files in realtime over such a network, just quite slowly (many vendors used to offer archive file systems which worked this way using tape or optical drives as the storage medium - AMASS was one such vendor).

Oracle 10g by Anonymous+Custard · 2003-10-29 10:44 · Score: 1

Oracle 10g kind of does this and a heck of a lot more... but I think you have to use applications designed for it in order to work.

This person's asking for Transparent Redundant Data Backup, which doesn't seem so unusual that no one's asked or implemented it before.

--
$8.95/mo web hosting

Re:if you should loose the master by dbarclay10 · 2003-10-29 10:49 · Score: 1

Yes, you could. There would be two options. The first would be to use "old-style" RAID, with an /etc/raidtab which describes which block devices belong to which RAID devices, and the configuration of the RAID device itself.

Secondly, there's "new-style" RAID, which is what I use. When the RAID driver is started, it examines all block devices for what are called "superblocks". Each spindle in the RAID array has a copy of the superblock, describing what array that particular device is a member of, and the parameters for the array.

Given that information, the RAID device is reconstructed. However, as another poster pointed out, the new master will need to have all the NBD devices set up before this will work.

Also as the other poster mentioned, only *ONE* machine can access each NBD device at any one time. So disconnect all the NBD devices from the old master before you turn on the new master.

--

Barclay family motto:
Aut agere aut mori.
(Either action or death.)

Cheap and easy by r_j_prahad · 2003-10-29 10:50 · Score: 1

Post a few articles on the net professing your undying loyalty to Usama Bin Laden. The FBI will back up everything for you.

Lustre and PVFS by nagare · 2003-10-29 10:50 · Score: 3, Insightful

The lustre project (www.lustre.org) is supposedly going to be the end all/be all of distributed parallel file systems, but I believe it is still fairly unstable and not ready for production use. In the meanwhile, the best one out there is PVFS(www.parl.clemson.edu/pvfs/). Fat chance trying to find Windows clients, but you can always re-export it with Samba.

Re:What if one of the nodes goes down? by cbreaker · 2003-10-29 10:52 · Score: 4, Insightful

What if you reboot one of the NBD servers? While you'll still have access to the data since it's a raid, I would well imagine that you would have to rebuild the entire "disk" once it comes back online.

Assuming a Raid5 with three nodes, and two go down not at the same moment, will all your data be lost?

I would think very carefully about these issues before putting all your valuable data on it. RAID isn't really designed for frequently unreliable connections like this. It's meant to prevent data loss if a hard drive crashes, which should be a fairly uncommon thing within a single system.

--
- It's not the Macs I hate. It's Digg users. -

Why? by Illbay · 2003-10-29 10:57 · Score: 5, Funny

...if I loose one of the disks that comprise the raid, the image would automatically reconstruct itself...

Why would you want to "loose" one of the disks? Don't you know they're supposed to stay tightly enclosed in their little boxes?

And why do you think that "loosing" the disk would help the image "automatically reconstruct itself?"

Actually, if you did that the disk would carom around the room like a very fast, very lethal Frisbee and you would be too busy trying to survive to worry about where your data went!

Just a thought

Otherwise, your plan sounds peachy.

--
Any technology distinguishable from magic is insufficiently advanced.

That can work! by Trejkaz · 2003-10-29 10:58 · Score: 1

Just rename all your important documents to porn and new movie titles, and EVERYONE will back them up for you!

--
Karma: It's all a bunch of tree-huggin' hippy crap!

What problem are we solving? by Netspider · 2003-10-29 11:02 · Score: 1

If you have n computers each writing all their information to n-1 computers over a IP network, you are going to have some really slow access.

Cluster Filesystem by Anonymous Coward · 2003-10-29 11:10 · Score: 1, Insightful

You'll be wanting a distributed cluster filesystem. There are several available, with their pros and cons. They are also all aimed at enterprise / HPTC installations. For home use you'll be better off with a set of RAID disks.

GPFS from IBM. This is free for academic use, but you pay for commercial use. Linux or AIX only.

GFS from sistina. Commercial offering. Linux only.

Lustre. This is beta quality code, but is freely available. It might work wonderfully, or it might eat your files.

(open)AFS. Works, but has limitations. It does not support large files and clients aren't available for all OSes.

Check out HiveCache by Jim+McCoy · 2003-10-29 11:17 · Score: 2, Informative

HiveCache is a distributed RAID system similar to what you are asking for, albeit one that is pitched to more of the enterprise backup environment than the home user. Strong security, error-correction and data replication, and multi-source data publiication and retrieval to eliminate the network hotspots that might otherwise occur.

While a pure linux solution seems to score the most points here, this particular one lets you combine your windows, OS X, and linux systems into a single distributed storage mesh. There is safety in numbers, and the more systems you can add to these sort of distributed storage systems the more reliable they become.

HiveCache is more of a backup solution, but I do know that it is possible to use this with a webDAV front-end for archival storage and other intersting storage possibilities.

snapshots with rsync by steveha · 2003-10-29 11:20 · Score: 1

How to do this is spelled out in the book Linux Server Hacks by Rob Flickenger. See tips #41 and #42.

Or see online:

http://www.mikerubel.org/computers/rsync_snapshots /

The beauty part: export the snapshots back to the users with NFS. When they lose a file they can get it back without asking the sysadmin to do it!

steveha

--
lf(1): it's like ls(1) but sorts filenames by extension, tersely

While we're talking about data storage by drinkypoo · 2003-10-29 11:31 · Score: 1

Why is it that no one has yet intelligently handled deletion of files in a mainstream operating system? When a file is deleted (unlinked?) I would like it to go into a holding area from which files are automatically removed one at a time when I run out of disk space for them, by some user-definable criteria, but certainly age would be an acceptable place to start.

Are there any operating systems at all that have this functionality now?

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Re:While we're talking about data storage by jonadab · 2003-10-29 16:59 · Score: 1

> Why is it that no one has yet intelligently handled deletion of files in
> a mainstream operating system? When a file is deleted (unlinked?) I would
> like it to go into a holding area from which files are automatically removed
> one at a time when I run out of disk space for them, by some user-definable
> criteria, but certainly age would be an acceptable place to start.
> Are there any operating systems at all that have this functionality now?

Not as far as I know, bit it would be really easy to implement in userspace.
First, set your OS to not show the trash/recyclebin/whatever on the desktop;
instead, place a shortcut there that points to a "recycleable" folder or
directory. You can even give it a trashcan icon if you want.

Then you write a script that checks the amount of free disk space, and if
it's less than a certain amount, removes the oldest file from the recycleable
folder and repeats (until there's enough disk space). This would be about
six lines of Perl. Put it on a cron job (*nix) or scheduled task (Windows).

Then you just have to train yourself to "delete" things by moving them to
the recycleable folder. If you're a GUI user, you'll drag them to the
shortcut. If you use the command line, you'll want a shell script or batch
file that moves its command-line arguments there. On *nix, you can alias
'rm' so that it calls your script instead of /bin/rm, but on Windows I think
you might have to train yourself to type a different command than "del",
unless maybe recent versions have added an ability I don't know about;
last I knew, builtins are always used first, before checking the path.

--
Cut that out, or I will ship you to Norilsk in a box.

You don't need RAID, you need offline backups. by BenRussoUSA · 2003-10-29 11:35 · Score: 1

I have managed hundreds of servers over the last decade. RAID helps with UPTIME, and high availability, it sometimes (rarely) helps with reducing data loss. Most of the time data loss is NOT BECAUSE OF DISK FAILURE. It is because of an idiot who accidently deletes the files or whole directory structures, or the logical volume.... 'nuf said. What you need to do is create OFFLINE copies of your work periodically. So, read up on rsync and write yourself a cron job. You can set up SSH/SCP on your windows box and you can then use rsync from the Linux boxes to backup your "Documents and Settings" dir on your Windows box. RSYNC even has command line options for creating snapshot backup directories.... There is a HOWTO at the samba site (where rsync comes from) that details scripts for how to create rotating backup scripts with RSYNC.

HiveCache by Gorm! · 2003-10-29 11:38 · Score: 1

Something that caught my eye a while ago in this area was HiveCache. Never used it, don't know anyone that has, but it looks like a pretty cool system.

Rsync & Rdiff-backup by hrath · 2003-10-29 11:47 · Score: 2, Informative

Check out http://rdiff-backup.stanford.edu/ for the wonderful rdiff-backup.

With the combination of rsync, ssh & rdiff-backup I have setup a very reliable incremental network backup infrastructure, allowing me to go back to any previous version of a file.

regards,

Heiko

Check out SFS or CFS by ca1v1n · 2003-10-29 11:53 · Score: 1

The self-certifying filesystem or the cooperative filesystem might do what you want, though I believe they only run on unix platforms. The code is considered to be in the alpha stage, but apparently the maintainers have been using it for a while without losing files. On some platforms SFS (on which CFS is based) has the nasty habit of deadlocking the kernel from time to time. You might want to read their documentation, since this might not be a problem for what you're running on.

SFS

CFS

--
WARNING: there is a trojan on your

Re:Check out SFS or CFS by quantum+bit · 2003-10-29 12:39 · Score: 1

I started using SFS for my home directory until I realized that it doesn't seem to support file locking at all. D'oh! GNOME and KDE don't appreciate not being able to lock their files.

File versioning useful, VMS variant not so sure by kingdon · 2003-10-29 12:05 · Score: 2, Interesting

The concept of being able to see the previous version sounds good. But on VMS, file versions didn't really achieve this all that well. Classic example: how do you delete a file?

Try #1:

DELETE FOO.TXT

This is really the wrong answer. If you have FOO.TXT;1 and FOO.TXT;2, then this command deletes FOO.TXT;2 and any attempt to access FOO.TXT will get you FOO.TXT;1.

Try #2:

DELETE FOO.TXT;*

This is the common recommendation, but you've now lost the ability to see any of the old versions.

The GNU file utilities (and emacs and some other GNU programs) have a file versioning scheme which is somewhat similar to VMS but somewhat better. Look at commands like "VERSION_CONTROL=numbered cp foo bar".

Personally, I usually put things which matter in CVS. With the CVS server in a distant city (at an ISP which provides ssh shell accounts). That gives me off-site backups.

Re:File versioning useful, VMS variant not so sure by shaitand · 2003-10-29 13:02 · Score: 1

Don't you think it would be depressing if that off-site ISP burned down or were hit by a meteor. You have a LOCAL backup as well right?
Re:File versioning useful, VMS variant not so sure by nahdude812 · 2003-10-30 02:22 · Score: 1

I assume you meant to ask if he also keeps a copy on his hard drive? Keeping a local *backup* (which would be a tape backup or other separate-media local duplicate of the working copy on his harddrive) wouldn't gain him a lot of added security over the remote backup.

In order for that off-site ISP being destroyed to be a problem to data integrity, that would have to happen at the same time that (s)he lost local data. The chances of catastrophy occurring on both ends simultaneously are low enough for average data usage, only data that had some very serious value would need to be backed up locally and remotely both, but if your local data store (basement / attic / spare bedroom of computers) burned down, you'd lose your working copy and local backup both, so the local backup isn't gaining you that terribly much anyhow. If you really want to secure your data, do two remote backups to geographically disperse ISP's, and keep your local copy on your hard drive.

--
Slay a dragon... over lunch!
Re:File versioning useful, VMS variant not so sure by rickmccl · 2003-10-30 03:31 · Score: 1

Erm, you're not using DCL correctly. You wanted to remove older versions of the file? You wanted the "PURGE" command, not DEL.
Like most things computery, it helps to use the correct command.

Re:NBD Does this (RAID level by oyoy · 2003-10-29 12:13 · Score: 1

You might want to do RAID 0+1 instead. That is, a stripe of mirrored disks.

Why? If you have ten disks in a mirrored stripe and loose one disk in each stripe you loose. If you have a stripe over mirrors, you can loose a disk in each mirror and still access all data (but it's time to check those backups...) With four disks, it would be the same. Add more disks and a stripe over mirrors is safer.

Someone could calculate the probabilities of loosing data in the two setups, but you have a better chance with striped mirrors. The performance should be the same.

Still confused? You're not alone...

Re:Sistina's GFS by spazoid12 · 2003-10-29 12:15 · Score: 1

GFS is pretty cool. I wasn't aware of the OpenGFS, so thanks for mentioning that. At Infospace we used Sistina GFS on a project and found the GPL GFS to be a bit painful and unreliable. We later spent the big cash on the non-Free flavor and were happy.

just a thought. by itzdandy · 2003-10-29 12:22 · Score: 1

..
how about if you used nbd and exported a small image on each machine on a network of about 100 machines, then used all of the drives in a software raid0? sure networks can be slow and 100 speed is just 12MBytes per second. but in theory you could get 12MBytes/sec * 100 = 1200MBytes/sec(1.17GB/s) or more realistically 1/2 that. OR if you were on Gigabit ethernet then you could be looking at a very high thoroughput but prob high access time with tcp/ip overhead.

I want something like this by Kris_J · 2003-10-29 12:27 · Score: 1

I have 125 (windows) computers across our student labs, soon to be 165. They all have at least 5 Gig of (wasted) space on their hard drives. Newer PCs have 20 or 30 Gig free. Average is 15Gig per PC, or will be when we hit the 165 mark. That's 2.4 TeraBytes. Even if we made 10 copies of everything students would still be able to share in 40 times more storage than we currently offer them.

What I need is a system that can cope with very low reliability of the computers in the mesh. Also, they're not 24/7 so the system (MC-ed by a server I assume) needs to unmount at 10:30pm when the PCs shut themselves down then boot all the PCs and (re)mount at 7:30am. It needs to cope with an entire lab of PCs being swapped out at the end of lease.

Anything like this?

KISS principle: one file server with RAID by Admiral+Burrito · 2003-10-29 12:47 · Score: 1

I know it's not what you're asking for, but I'd recommend setting up one of those 8 boxes as a file server with regular RAID. It's a simple and proven way to get the end result you're after. It doesn't have to be expensive either, a pair of IDE drives (each alone on their own IDE port) and Linux with its built-in software RAID, exported to the network via NFS and/or SAMBA and/or whatever else.

iSCSI sounds a good fit. by MarkTina · 2003-10-29 13:00 · Score: 1

I was just thinking that this is just up iSCSI's street.
Have multiple iSCSI targets and then use a software RAID-5 implementation (it wouldn't care because as far as it's concerned the iSCSI device is a "local" device) on he initator machine.

Would be neat to see in action, anyone got enough kit at home/work to try it out and report back ? .... my wallet doesn't open far enough at the moment :-)

Mark

HyperSCSI by Nicson · 2003-10-29 13:24 · Score: 2, Informative

I'm surprised to see nobody has yet mentioned HyperSCSI, which is:
- opensource
- based on raw ethernet (supposedly faster than iSCSI or other TCP/IP-based schemes)
- has a Win2K client

Check it out, I've tested and used it since about a year and it works quite well!
--
Nicson

Why? by Raunch · 2003-10-29 13:40 · Score: 1

Other have already pointed out the NBD solution, so I wil not repeat it here, what I would like to ask is; why?

It would be much easier for you to put four 120Gig IDE drives in one of your eight computers and use a real RAID setup. I have a dual 60gig that holds all of my home direcotries and all of my mp3's, and that in itself is enough for me.

Set yourself up a RAID-5 so that the performance is not dismal and you will be set. Hardware IDE RAID is really cheap now, or you could go with a software solution if you don't have very intensive data writes.

--
George II -- Spreading Freedom and American values, one bomb at a time.

this is my potential thesis topic by dosguru · 2003-10-29 13:49 · Score: 1

I've been looking into this as part of a project for one of my seminar classes this semester. Perhaps I can do my thesis on this as well, I'll have to see what happens.

It all seems to depend on what it is needed for. For a "normal" RAID, this won't work. For a cheap backup on a WFGM (Wide Family/Group Network) this has a lot of potential.

nifty...but.... by MoFoQ · 2003-10-29 14:00 · Score: 1

speed will take a hit if u'r not using gigabit ethernet.

also there's suppose to be a filesystem based on the PAR2 thingie....and I'm sure someone has mixed the two (or at least have a net-RAID5 type thing).

once I upgrade my networking schtuff to 1000Mb, then I'll do it too....but for the time being, I use my lil' linux toaster (Shuttle SS51G) as my central file storage location (ty samba!)

beware the real data loss culprit :) by mr.+methane · 2003-10-29 14:05 · Score: 1

It's not windows, and it's not linux, either. It's human error.

I'm using one of Ximeta's ethernet-connected 160gb drives. It also has a usb2 connection, you can only use one or the other at a time. And only one client can have R/W access at a time - the others get RO access.

Mostly, I back up each machine's personal data and config to it periodically. I'm just talking about a home lan here, this is not an office-scaled solution.

I'm still looking for a better solution. sooner or later, some smart guy will make a shoebox-sized server, with redundant drives, and basic file sharing/locking support (nfs/smb). The drives themselves are so cheap (on the order of $1 per GB now) that the hardware and management required to make RAID5 aren't economical for most personal users. I think it needs to retail for under $300 to be a real winner, and I know that's a tough barrier.

It's called the Glade PCS. by McDoobie · 2003-10-29 14:12 · Score: 1

The Glade partition control system has been doing this and more for ages. It's used in mission critical military applications. And, oh yes, it's free.
Check out http://www.act-europe.fr/ and click on the Glade link.

Re:NBD Does this - NBD server for windows by Pieroxy · 2003-10-29 14:18 · Score: 1

It looks to me like a waste of resources. Why not setting a cron job that copies the content of the partition you want to back up on n other systems?

That's the way I am set up at home: One linux box (my server) has a 80GB hdd. That's where I put everything I have valuable (mp3, pr0n, cvs, db...). Every night, at 1:53AM, a cron job starts, stop every service susceptible of changing the data (Tomcat, Mysql, cvs...) and back the HDD up through the network over to my second PC. Then all services are restarted and everything is up and running again. Incremental backup allow this operation to take a few minutes. The down time is usually not a problem since it's my home personnal system.

--
Write boring code, not shiny code!

LOSE, NOT LOOSE by HughsOnFirst · 2003-10-29 14:19 · Score: 1

This confusion dates back to the days of "Loose it or Lose it" when long bows were the new high tech weapon.

Distributed filesystems not yet mature by elronxenu · 2003-10-29 15:15 · Score: 1

What you really need is a distributed, serverless filesystem - one which lets you store files on all your disk drives on the LAN, with automatic redundancy of data (so if a machine goes down or its storage becomes unavailable, you still have a copy of your data blocks on one or more of the other machines) and ability to access those files from any machine on the LAN. A serverless filesystem is one in which the participating machines act as peers - i.e. no master server. Distributed and serverless filesystems are a hot research area right now but I'm sorry to say that they're not yet ready for the mainstream.

I went through the "is CODA right for me?" phase, and also "is InterMezzo right for me?" and also spent tens of hours researching distributed filesystems and cluster filesystems online ... my conclusion is that the area is still immature, I will let the pot simmer for a few more years (hopefully not many), and use NFS with one or two servers in the meantime.

My situation: desire for scalable and fault-tolerant distributed filesystem for home use with minimal maintenance or balancing effort. Emphasis on scalable - I want to be able to grow the filesystem essentially without limit. I also don't want to spend much time moving data between partitions. And last but not least, the bigger the filesystem grows, the less able I will be to back it up properly. I want redundancy so that if a disk dies the data is mirrored onto another disk, or if a server dies then the clients can continue to access the filesystem through another server.

All that seems to be quite a tall order. I checked out CODA, afs, PVFS, sgi's xfs, frangipani, petal, NFS, InterMezzo, berkeley's xfs, jfs, Sistina's gfs and some project Microsoft is doing to build a serverless filesystem based on a no-trust paradigm (that's quite unusual for Microsoft!).

Berkeley's xFS (now.cs.berkeley.edu/Xfs) sounded the most promising but it appears to be a defunct project. The source code is online however, so maybe somebody can resurrect it. Frangipani sounds interesting also, and maybe a little more alive than xFS.

On the other hand CODA, afs, intermezzo and Lustre are all in active development. afs IMHO suffered from kerberitis, i.e. once you start using kerberos it invades everything and it has lots of problems (which I read about on the openAFS list every day). AFS doesn't support live replication either - replication is done in a batch sense.

CODA doesn't scale and doesn't have expected filesystem semantics. For 80 gigs of server space I would require 3.2 gigs of virtual memory, and there's a limit to the size of a CODA directory (256k) which isn't seen in ordinary filesystems. There's also the full-file-download "feature". CODA is good for serving small filesystems to frequently disconnected clients but it is not good for serving the gigabyte AVIs which I want to share with my family.

InterMezzo is a lot more lightweight than CODA and will scale a lot better, but it's still a mirroring system rather than a network filesystem. I might use that to mirror my remote server where I just want to keep the data replicated and have write access on both the server and the client, but it's again not a solution for my situation.

The best thing about intermezzo is that it sits on top of a regular filesystem, so if you lose intermezzo the data is still safe in the underlying filesystem. CODA creates its own filesystem within files on a regular filesystem, and if you lose CODA then the data is trapped.

Frangipani is based on sharing data blocks, so like NFS it should be suitable for distributing files of arbitrary size. I need to look at it in a lot more detail; this is probably the right way to build a cluster filesystem for the long haul. For the short term, Intermezzo is probably the right way for a lot of people: it copies files from place to place on top of existing filesystems.

I got motivated to look at Frangipani again. No sour

Re:NBD Does this - NBD server for windows by whereiswaldo · 2003-10-29 15:16 · Score: 1

NBD-server for windows

I'd be hesitant to put my stuff on Windows boxes if they were also used for other purposes. Most people set Windows up so they have administrative privileges. That means they could probably see all the files you are distributing - at least the filenames even if the data was only 1/5th the entire file or whatever. What about the issue of files becoming corrupt because someone's computer catches a virus which taints your data? Any checksumming?

OpenAFS by sdirector · 2003-10-29 15:34 · Score: 1

So AFS is the oldest and probably the most robust of the choices. (Ok, so AFS is, but you probably don't want to buy AFS from Transarc, so just use OpenAFS) It is a distributed file system that allows for replication of data across servers and all of that. It is in use at MIT, NCSU, CMU and other good CS places. And you can use it on *nix and W32. It isn't the easiest choice to get running, but if you actually want the thing closest to Raid-5 across machines, this is definitely the choice for you.

All wrong... but... by Psarchasm · 2003-10-29 15:37 · Score: 1

I've only seen one answer thus far that even comes close to solving the problem as the user attempted to describe it. But I think the problem was that the person didn't know exactly what they really wanted, and therefore worded the question poorly.

The correct answer to this question is a mixture of solutions... as it makes no sense to completely mirror a filesystem accross multiple workstations. You'll never need to carry that entire filesystem with you at all times unless it carries your booting operating system.

Therefore I present my solution:

For the home user... dedicate two machines (your servers) to the redundant raid of your choice and means. RAID 5 could be the answer, RAID 1 could be the answer... RAID 5+1 could be the answer... not enough information is given to know just how much and what CRITICAL data you could possibly have at home. However this does give you a level of redundancy at the drive level. I would highly suggest making use of LVM in servers with more space to add drives later down the line.

Next step is to mirror the data accross the two servers. I suggest CODA. Not terribly difficult to install, RPMs available if thats the way you bend, lots of time under its belt and because of what we are about to do, Windows is not required.

So how do my Linux and Windows clients get to the data? Well. There are a bunch of ways to accomplish this. You could install multiple types of network filesystems to support multiple operating systems. Which to me has always seemed rather crappy. Who wants to match all those user ids one might use. Or, horror of horrors, allow SMB or NFS (or Appletalk) out of the local network? Not me. BUT... what about WebDAV? Still somewhat in its infancy - and its already had a rather significant remote hole - it is fairly elegant. Linux, Windows 2000+, and MacOS X all support it... its web based (so your going to be running a web server too)... and your can run the whole thing under SSL. This makes it available to you from just about anywhere, and using just about anyones computer (though there are certainly security issues when authenticating if you want to do this). And it will natively pass through just about any firewall (including Application Proxy firewalls).

BUT... and this does suck, you cannot manipulate files directly on the WebDAV share. Files must be copied to local storage, editted, then copied back over.

So... your looking at Linux, LVM, RAID (hardware preferably), CODA, LVS (if you so desire), Apache, and WebDAV. Reading between the lines this really sounds more like what you are really looking for.

Of course, thats just my opinion. I could be wrong.

--
http://windows.scares.us

Raid across the network.. by technos · 2003-10-29 15:49 · Score: 1

I tried something like this back in 97-98.

Set up nfs servers on all the the computers that would store the data (servers), and setup loopback and software raid on the systems that would access it (clients). There was overlap between the two groups.

Created a couple hundred meg file on each of the computers in the exported directory. dd, yadda yadda..

Wrote a short script to mount all the exported trees, slap the files it found on loopback, and copied it around the clients. Made sure it would look for a lockfile, don't want more than one client accessing them at a time. Was a simple touch and exists affair.

Used one machine to make a raid, FS, etc, on the loopbacked devices.

Wrote a second script that would take the loopbacked devices and mount the raid.

Never quite got it to run right tho, just bought a tape drive instead. Guess you could play with it. The significant logical prob seemed to be that until you unmounted the raid and the NFS tree, you couldn't rely on data actually being written. Course, the raid code sucked donkey back then, and the NFS code was just erratic, so.. Mebbe things have improved.

--
.sig: Now legally binding!

Keep it simple, ...... by ericman31 · 2003-10-29 16:03 · Score: 1

While all of these ideas are "really cool", let's operate on the KISS principle here. With the low cost of IDE RAID these days, why not just create a RAID 1 mirror set and NFS export it. You could do 200 GB for probably $300 or so, based on a quick check of the prices on Comp USA. And if you really want distributed redundancy, set up a second system with another RAID 1 array. Then rsync the two with cron.

Using nbd or afs is pretty cool, technically speaking, buy way overkill for a home network and way more trouble, both to set up and to maintain, than it's worth. Instead, for the cost of one more PC you can set up a very redundant system. And in the extremely unlikely case that you lost both hard drives supporting your primary nfs simultaneously you could redirect yourself to your secondary and keep right on working. In a more likely scenario, where you lose just one drive, you immediately rsync to the secondary, repoint all clients to the secondary, and keep going. Replace the failed drive in your primary, then you could either fail back, or demote it to be the new secondary.

Why make it so hard?

--
In my universe I'm perfectly normal, it's not my fault you don't live in my universe.

Eight computers? by Laconian · 2003-10-29 16:37 · Score: 1

Eight computers????
(deep breath) NERD!!!!!!!!!!

give me a break. by twitter · 2003-10-29 17:54 · Score: 1

Compiling? Why? Why not just log into the box and do your compiling there?

I'd considered the problem from the perspective of grouping up many small hard drives in various boxes to get more and more secure storage for archiving, not something active like compiling your kernel.

As for the floppy example, you should note how good the performance was. He moved a 3.6MB file to it in 32 seconds, that might sound slow to you and me, but 112KB/s, close to the USB maximum throughput. The RAID software used the interface. If I'vr decided I want to archive something via my network, I've already decided that the delay is worth it. If a net RAID sucks down my data as fast as I can send it, but also gives me error correction, I've done myself a favor by using it. This might not work so well for kernel compiling, but it would be just fine for tar files of images.

--

Friends don't help friends install M$ junk.

So who's up for the challenge... by jhs2 · 2003-10-29 18:04 · Score: 1

Too many of these threads are focused on Linux/Unix running some kind of experimental File System or some form of file replication tool like rsync. The reality is that most "Common Folk" don't have nor want to run any of this complicated infrastructure. They simply want to install a small little app on their Windows 2000/XP machines on their home network (which is maybe 3 boxes) and have them backup the data between them automatically balancing out free disk space with redundancy. Think of it like RAID 5 over the network.

Now, that all being said, let's think outside of the box a bit. Nearly every one of us that has more than one machine at home can benefit from this type of application. If it's difficult to setup, it simply won't be used by the masses. A good example of a "throw hardware at the problem" type solution is the Mirra (http://www.mirra.com) which should be coming out at the end of the month. If there was some way to setup something similar to the type of thing Mirra provides, but using the distributed resources of the existing comptuers on the network, then we really have a killer app!

I know this is all dream conjecture at this point, because we all know that something this good just simply doesn't exist, but it certainly sounds like the start of a good open source project. So, here's the challenge: Build something that will run as a service/daemon on Windows and Linux which will share free disk space transparently to the collective for automatic backups of information on other systems. Using things like WMI event triggers, you should be able to update files on other machines as soon as they are altered. The system should be able to broadcast and self configure, be secure, and allow for network interruptions.

Perhaps I'm dreaming here...but this would be the best thing to happen to home networks since cheap ethernet. Mirra+RAID5+AFS=???

Who's game?

--
"Failure is not an option. It comes bundled with any Microsoft Product."

Tape Back up! by teameco · 2003-10-29 18:56 · Score: 1

Its a blast from the past! Tape, ahh it is now cheap and boaring and will fall apart but how many of us can say we have a tape backup every sunday night on our network? I think that is pretty darn cool!

--
TheOne [ECO] http://www.TeamECO.com Team Leader

Old program... by dotwaffle · 2003-10-29 20:06 · Score: 1

I remember there being a program I once tried out, where you ran it on a windows system, and what ever you put on that drive could be seen by everyone else ala mounting a windows share, but the data was not just stored on one machine, it was stored on two. And when you shut down one of those machines, it propagated the survivor to another machine, so it kinda grew and used harddisks over the network. Kinda like a peer-to-peer file server. But I can't remember what it's called... Anybody? I think it was called mango or something, but can't find it! You could set the usage down to 0Mb if you wanted that machine not to contribute to the "collective" and I think there was even a *nix version. Guys? Ideas?

It'll be marked as fake, include real pr0n to fix by aaron_pet · 2003-10-29 20:33 · Score: 1

Do the reverse of Fight Club...

Include a few frames of your familly photos!

or, you could put the info into the jpg header...

You'll probably want to at least zip it and embed it into another file...

I wonder how many terrist messages I have on my computer?

(George Bush to Donald Rumpfelt, cc: Homeland Infultration department: Lets get the news media to play more clips of 9/11 so I can grab even more power!)

--
Please use [ informative / summarizing ] SUBJECT LINES
Flame me here

Nah, this is a poor solution by Moderation+abuser · 2003-10-29 22:43 · Score: 1

Sure, it sounds good on paper, in practice it introduces massive complexity which introduces loads of opportunity for failure.

A better solution:

Move as much of the storage as possible onto a system designated as the server, add the backup device to the server and mirror the data on more than one drive. It'll be more secure, much simpler, better availability and faster.

--
Government of the people, by corporate executives, for corporate profits.

GPFS can do by TheToon · 2003-10-29 23:32 · Score: 1

GPFS (General Parallel File System) from IBM can do this, runs on AIX and Linux. No Windows and quite expensive and not suitable for the casual home user.

Click here for more information.

--
//TheToon

I would have used a different solution by einhverfr · 2003-10-30 00:30 · Score: 1

If you want raid emulation, NBD works well, but if you are looking for transparent, secure, distributed file storage, I would prefer to use OpenAFS (http://www.openafs.org). It is cross-platform, powerful, and secure, though it does have a learning curve.

I guess it depends what exactly you are trying to do.

--

LedgerSMB: Open source Accounting/ERP

Distributed Internet Backup System by trawg · 2003-10-30 00:35 · Score: 2, Informative

not really relevant, but may still be of interest to some (just sounds so neat): "Since disk drives are cheap, backup should be cheap too. Of course it does not help to mirror your data by adding more disks to your own computer because a fire, flood, power surge, etc. could still wipe out your local data center. Instead, you should give your files to peers (and in return store their files) so that if a catastrophe strikes your area, you can recover data from surviving peers. The Distributed Internet Backup System (DIBS) is designed to implement this vision. "

http://www.csua.berkeley.edu/~emin/source_code/d ib s/

Definitely check out FAUBackup! by Boiner · 2003-10-30 02:30 · Score: 1

Faubackup is a super-slick backup option that will save you from a) media failure, and b) stupid mistakes. It won't fix c) catastrophy, but that's easy anyway.

I have two devices in my soho development machine, one hdd is for backup only to cover media failure of my main drive. Each day, faubackup runs against my home dir and makes a new replica over on drive 2. The really neat part is that it uses links in the filesystem, so the new dir *looks* like it should, but only new files actually take up space on the filesytem.

Out of the box, it keeps two yearly images, twelve monthlies, 4 weeklys and 7 daily copies. I have more choices to recover fresh stuff than older stuff (obviously). Old stuff falls off automatically each day too.

Available for Debian. I can *not* over-emphasize how cool this little utility is. It's a real 'set-it-and-forget-it' backup solution.

EtherDrive Storage by web_guy1000 · 2003-10-30 02:56 · Score: 2, Informative

You might consider EtherDrive storage from www.coraid.com. I use it on Linux with software raid. Works like a champ.

Virtual iSCSI or FC disk. by bored · 2003-10-30 03:20 · Score: 1

Since no one else has said it... You could export virtual iSCSI disks from all of your hosts using software like Intel iSCSI refrence and then remount the disks and RAID the result. Depending on your machine config you could just leave it at that. If your running a bunch of diffrent platforms your best bet might to be to then reexport the RAID as a CIFS or NFS file system from one of the machines.

HiveCache r0x0rz by Glass+of+Water · 2003-10-30 03:28 · Score: 1

I can't believe nobody's mentioned HiveCache.

"HiveCache's revolutionary SwarmBackup and SwarmStorage technology give you high-reliability backup/restore and data storage services by storing data in the free disk space on desktop PCs in your enterprise. HiveCache technology uses peer-to-peer technology to build a reliable and fault-tolerant distributed storage mesh for your backup data, eliminating the network bottlenecks usually associated with network backup systems and without forcing you to purchase costly server storage that will quickly become obsolete."

A really well designed system for backups (not a RAID replacement, but that did not seem to be teh question).

--
There are no trolls. There are no trees out here.

DFS vs. over-sensitive girlfriends by mnemotronic · 2003-10-30 05:06 · Score: 1

Ain't no DFS on the planet gonna help if (like me) your girlfriend asynchronously decides that "all electrical devices are bad for me, so turn off everything when you're not using it".

--
The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.

make SURE one LAN copy is in the garage by pensivemusic · 2003-10-30 05:09 · Score: 1

a fireproof insulated vault, safe and sound from your primary storage copies burning up at the house. separate copies are better than ones in the same physical location

Re: Answered to a different post? by catenos · 2003-10-30 08:00 · Score: 1

We seem to have a communication hole here. I don't see how your answer relates to my rebuttal of your claims. I'll try to clarify...

Compiling? Why? Why not just log into the box and do your compiling there?

Which box, the one with the left or the right part of the mirror of the RAID system? Remember, this was about a network file system configured in a way, that the overlying RAID system would give redundancy by storing the two parts of a mirror configuration on different computers. In other words: no matter on which machine you are logged in, some disk of the RAID is not on the local computer.

But to come back to your question of why not just log into the box and do your compiling there: Because the idea of the network file system was to connect several (8, IIRC) computers in the LAN. If he could do all his work with only 1 computer, he had no reason to have a LAN to begin with. E.g. different OSes. I don't know, if it is a matter for the original poster, but he explicitly said, it was a misc envirement and I would rather not try to cross-compile Windows to Linux or vice versa, if I don't absolutely have to.

As for the floppy example, you should note how good the performance was. He moved a 3.6MB file to it in 32 seconds, that might sound slow to you and me,

I didn't argue the speed of the floppy RAID at all, but the speed of USB.

but 112KB/s, close to the USB maximum throughput.

This was my point. I argued that saying "if you can do RAID over USB..." is bogus, when USB was only good enough, because it was a floppy RAID and as you just said yourself, even then USB barely managed to keep up.

If I'vr decided I want to archive something via my network, I've already decided that the delay is worth it.

But this was not only about archiving, but about replacing the complete local data storage by a network storage (in order to have a global redundancy).

If a net RAID sucks down my data as fast as I can send it, but also gives me error correction, I've done myself a favor by using it.

I completely agree. But I never argued about that point. What I argued was your claim, that one would not notice the speed loss [at least on Windows].

This might not work so well for kernel compiling, but it would be just fine for tar files of images.

Nope. I already anticipated that argument in my previous post and answered it there: "Ah, and if compiling does not fall into the "data storage" category: Well, simply copy that 50MB log file around, and some seconds become minutes (regarding the nobody would notice a "10 MBit" link)." (if you do log files or images doesn't make a big difference, as long as the file size is counted in MB).

--
Keep an eye on which arguments are silently dropped in replies. Not always, but often times it's very telling.

Re:NBD Does this - NBD server for windows by FreakinHippie · 2003-10-30 16:10 · Score: 1

Build your filesystem using LVM. Then you can shutdown your services for two seconds while you take a snapshot of the partition(s). Then restart the services and sync the snapshot. I use this method for my company's backup server(s). It works quite well.

Re:NBD Does this - NBD server for windows by Pieroxy · 2003-10-30 18:44 · Score: 1

The point is that is supposed to be for a home network, so even a 10mn downtime around 2AM shouldn't be a problem! But thanks for the info, I'll have a look!

--
Write boring code, not shiny code!

AFS by duffbeer703 · 2003-10-31 01:47 · Score: 1

www.openafs.org

--
Conformity is the jailer of freedom and enemy of growth. -JFK

Pre-Owned RAID by man_ls · 2003-10-31 17:05 · Score: 1

Why don't you consider a pre-owned high-end RAID system?

If you're willing to pay even a couple thousand dollars, you can get a very highly redundant RAID 6 subsystem with high throughput. (or two, if you want to spend more.)

admin@jkoebel.net if you're interested in them. It may be more than you're looking to spend (free software...>=$2000+ RAID cabinets) but if you're interested, I'll work with you on it.

Re:NBD Does this (RAID level by Omega996 · 2003-10-31 17:15 · Score: 1

Even if you lose a disk in each mirror, you can still access all your data too.

if you're not a native english speaker, you can be excused for your ignorance. if you are, you're a fucking idiot that needs to go back to grade school.

note the difference:
loose:
1. Not fastened, restrained, or contained: loose bricks.
2. Not taut, fixed, or rigid: a loose anchor line; a loose chair leg.
3. Free from confinement or imprisonment; unfettered: criminals loose in the neighborhood; dogs that are loose on the streets.
4. Not tight-fitting or tightly fitted: loose shoes.
5. Not bound, bundled, stapled, or gathered together: loose papers.
6. Not compact or dense in arrangement or structure: loose gravel.
7. Lacking a sense of restraint or responsibility; idle: loose talk.
8. Not formal; relaxed: a loose atmosphere at the club.
9. Lacking conventional moral restraint in sexual behavior.
10. Not literal or exact: a loose translation.
11. Characterized by a free movement of fluids in the body: a loose cough; loose bowels.

lose:
1. To be unsuccessful in retaining possession of; mislay: He's always losing his car keys.
2.
1. To be deprived of (something one has had): lost her art collection in the fire; lost her job.
2. To be left alone or desolate because of the death of: lost his wife.
3. To be unable to keep alive: a doctor who has lost very few patients.
3. To be unable to keep control or allegiance of: lost his temper at the meeting; is losing supporters by changing his mind.
4. To fail to win; fail in: lost the game; lost the court case.
5. To fail to use or take advantage of: Don't lose a chance to improve your position.
6. To fail to hear, see, or understand: We lost the plane in the fog. I lost her when she started speaking about thermodynamics.
7.
1. To let (oneself) become unable to find the way.
2. To remove (oneself), as from everyday reality into a fantasy world.
8. To rid oneself of: lost five pounds.
9. To consume aimlessly; waste: lost a week in idle occupations.
10. To wander from or become ignorant of: lose one's way.
11.
1. To elude or outdistance: lost their pursuers.
2. To be outdistanced by: chased the thieves but lost them.
12. To become slow by (a specified amount of time). Used of a timepiece.
13. To cause or result in the loss of: Failure to reply to the advertisement lost her the job.
14. To cause to be destroyed. Usually used in the passive: Both planes were lost in the crash.
15. To cause to be damned.

Re:MSI OSS by Omega996 · 2003-10-31 17:20 · Score: 1

the fact you're having problems with RPMs demonstrates more than half your problem - you're using the wrong distro. you should be using debian, or at least using the apt tools on RedHat/SuSE/whatever the fuck distro you're running that's using RPMs. Me, I use windows.

Re: ect. by Omega996 · 2003-10-31 17:25 · Score: 1

maybe he was abbreviating ectoplasmic, for some reason?

*shrugs*

Re:What if one of the nodes goes down? by wouterke · 2003-11-02 14:15 · Score: 1

If you reboot the NBD server, the connection is lost. You'll have to restart the nbd-client process to make it work again.

Obviously, if two servers go down, you'll start losing data. Of course, that's a property of RAID5, not of NBD...

Re:What if one of the nodes goes down? by cbreaker · 2003-11-02 15:08 · Score: 1

Yes, the connection would be lost, but since they are talking about RAID on this thing, when the node comes back online you'd have to rebuild the entire "virtual disk" which would be fairly time consuming and network intensive I'd imagine. Even if there was some mechanism in place that did a consistency check on the "disk" to see what data needs to be updated to be in sync with the raid volume, it would be slow. And most raid's just rebuild the volume, I don't think the Linux software raid is any different. (could be wrong, never used Linux raid before.)

And yes, it's a property of RAID, of course, obviously.

I sure wouldn't trust my data to this type of system. Too many points of failure. Like I said, your average RAID wasn't designed for this type of application. A mirror set might work okay, but I can think of a lot better ways to accomplish redundant data on a network - plus, I don't think duplicating data is exactly what the guy had in mind on his question.

--
- It's not the Macs I hate. It's Digg users. -

Re:NBD Does this - NBD server for windows by Knetzar · 2003-11-03 22:57 · Score: 1

He's reading /. and is therefore probably a computer geek.
What makes you think he goes to sleep before 2am?

What do people who have no set schedule do?

Slashdot Mirror

Distributed Data Storage on a LAN?

310 of 446 comments (clear)