Making Use of Terabytes of Unused Storage
kernspaltung writes "I manage a network of roughly a hundred Windows boxes, all of them with hard drives of at least 40GB — many have 80GB drives and larger. Other than what's used by the OS, a few applications, and a smattering of small documents, this space is idle. What would be a productive use for these terabytes of wasted space? Does any software exist that would enable pooling this extra space into one or more large virtual networked drives? Something that could offer the fault-tolerance and ease-of-use of ZFS across a network of PCs would be great for small-to-medium organizations."
It's the obvious choice.
install vista on them, that would fill up that space and give you something to manage your time a little better than wondering about what you could manage..
Does any software exist that would enable pooling this extra space into one or more large virtual networked drives?
Absolutely! Just hook them up directly to the internet before you update the machines, wait a few minutes, and voila! They'll be filled up with extra files in no time! Hey, you didn't say anything about wanting to be in control of what gets put on the machines...
If you have a very robust local network with plenty of spare capacity, and can accept a performance hit on the client computers, I am sure some kind of linked filesystem would be possible. In most practical situations, I think this idea would be a non-starter.
If they're in a computer room, then such a scheme might work. But, if they're on user's desks, you don't really have control. They're subject to filling up, being shut off, being knocked about, crashing, etc. I don't think in this case you would really get the reliability that the diversity and independence would suggest.
--Marc
You'll get more selling everything you have and investing in a storage solution then you will paying for the electricity to run all those crap drives.
...just in case your connection fails.
Is this a company, college, or just a random collection of boxes in your mom's basement? What function does your organization want to do that it can't because of a lack of a few terabytes? What does the actual owner of these boxes have to say about your little enterprise?
You could try to use something like "Localhost Azureus" for distributed data storage. The only problem will be that it will cost you in terms of processor and network hogging.
Is it cost effective to reclaim that (small) space? Probably not. My suggestion is to realize that no-one tries to save clock cycles any more and maybe this is the way disk storage is probably heading that way.
It's a very interesting question, but from my point of view, hard drive space is so ridiculously cheap nowadays that it is utterly pointless to look for a useful application that will fill it up.
Let's assume that the average computer has 80 GB of storage. Multiply that by 100 and you get 8 TB of space. That's what you can get into one or two computers nowadays without plunging out too much cash.
What's more interesting is how much processing power you have as well as how fast the internet connection is.
Full Tilt
This is the dumbest /. question I've seen.
Decentralized network storage pooled together with no means of practical management? Sign me up! Oh yeah, let's rely on the ditzy end users to help make sure it doesn't crash. I'm sure everyone will leave their computers on 100% of the time so you can make use of it.
Don't tell anyone at work of your idea, they might not ever stop laughing.
Check out GlusterFS. (http://www.gluster.org)
You definitely can't run Windows in order to utilize this, but it should be a minimal effort to setup a quick netboot lab to test it with.
Cheers.
Short answer: No.
Long answer: Nope.
I had a drive fail on me last year and I wanted to take my frustration out on it so naturally I did what any good American would do. I shot the shit out of it. Surprisingly it seemed to make for a pretty good piece of bullet proof armor. It stopped multiple rounds of full metal jacket 9mm rounds and managed to get a couple rounds lodged inside the casing. (None appeared to penetrate fully)
If I can not smoke in heaven, then I shall not go. -- Mark Twain
Datacore offers software called Sanmelody to turner servers into a cheap storage network and there are other vendor solutions as well. http://infiniteadmin.com/
I don't know at where I work tons of engineers leave their computers online overnight you could do backups over night or transfers or whatever. Or you can do something similar to Seti@ Home, run when computers are idle or not utilizing any processing power. I think the big hurdle is partitioning off a part of each hard drive so that the user can't access it, so what they don't know about they can't be angry about losing. The thing I see as a problem if you use it as a virtual drive or backup or something, security. Servers are nice because they are locked up and monitored and generally protected more than user workstations. Where I work the workstations aren't locked up or anything, I would be very wary of allowing secure company documents to be stored on something that is amorphous by its nature. I work in the aerospace industry and I know that heads would roll if some of the documents we generated were leaked because lots of the stuff on servers is classified, proprietary or IP.
OpenAFS is a distributed file system. It seems to fit your bill. No personal experience, so don't know how well it actually works.
Doolittle :
Bomb no.20 : To explode of course.
They need secret servers on unknown locations, you know...
There's project dedicated to this on Linux, http://nbd.sourceforge.net/.
If there's nothing similar for windows, you might be able to run it through cygwin.
Actually, this claims to run on Windows: http://www.vanheusden.com/Loose/nbdsrvr/
I've been thinking of the same thing of late. Our IT department uses this huge SAN at $$$ money. Why couldn't a distributed fault tolerant (with something like striped with parity) be implemented across a LAN with 100Mb/GigE? The standard drive size being shipped on new PC's is at a minimum about 200GB. For biz users that is WAY overkill.
Our whole organization is about a 1000 Windoze desktops, but I'd like to try it in our local workgroup first (maybe 20 systems). I looked around but couldn't find anything that would pool unused desktop space.
You could use extensive subversioning on each machine individuall to get an benefit out of unused discspace und computing power. User who accecidentially overwrite or delete could get them back from there own disc space. Some kind of NFS would use a lot of network traffic an bandwith is often a limiting faktor.
I tried to tout the merits something like this could have for non-critical regular user backups, but as previous posters mention, it was shot down.
:)
/doc" or whatever. NetDrive (and I'm sure there are others) help take away the learning curve and hassle of "here, use this internal ftp for backups, not a network drive" as it will map the actual FTP to a network drive and appear like normal.
I was suggesting to run DrFTPD as a backend with NetDrive as an access medium. It looks good on paper, but I've never had the chance to apply it so widescale
With DrFTPD it's easy to setup whatever kind of redundancy you would want, ie: "at least 3 nodes will mirror all files in
Just my 2c.
It could be that the only purpose of your life is to serve as a warning to others.
The first question to ask is whether what you want to do makes any sense for your employer. Who has to maintain this beast once you build it.
http://www.dcache.org/ You will need a system to act as a master, but otherwise your normal nodes should work great.
I am that much more enlightened and proportionally disillusioned
Try Revstor's Sanware which allows you to designate nodes (servers) that will provide resources to create a storage area network. http://infiniteadmin.com/
This sounds like somebody is asking for wuala. Possible slashvertisement?
He have a few compute nodes around here. Each of them has an HD, and as those are so cheap we gave them 500Gbyte ones.
They dont really need lots of space (maybe 30Gbyte for OS and temp-files), otoh without redundancy the other 450Gbyte are worthless.
As the task is emberassingly parallel, Network traffic wouldnt be a problem.
If there was a solution to compine all this storage (doesnt even have to be transparent) into a distributed, redundant storage network, i could surely make use of those Tbytes
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
Please stop typing words like "utilization" when you mean "use". You sound like a PHB trying to sound smarter than he really is and you make it a pain for people to read what you write, especially non-Anglophones. Read George Orwell's essay on this topic.
... they'll need them.
Tautologies, they are what they are.
If you don't already have a backup mechanism for the data that may be on these systems, one way to use all the available storage is for backup. Vembu StoreGrid a solution designed specifically for this problem. Get more info @ http://www.vembu.com./
You might want to ask yourself why, after more than a decade of research and countless papers and prototypes that address this problem, your PCs storage are still underutilized...
It's harder than it looks to get something reliable. Your PCs have extra capacity because it's cheap, but mining that capacity is not cheap. As other posters have pointed out, putting together (or just purchasing) a server with a few TB of storage is simpler and cheaper, less prone to getting wiped out by a virus, easier to manage and backup.
Am I part of the core demographic for Swedish Fish?
While I was in college, I worked in the IT department. In my experience, your end-users will have a proverbial shit-fit if their computer's HD starts spooling up when they aren't doing anything. While it would be nice to use the spare space for data storage, I'm not sure it would be worth the headache. The volume of user complaints would skyrocket, you'd have to train them to leave the things on all the time, and you'd have a distributed data pool to manage. Changing user behavior is like teaching a two-year-old to say "thank you" (It's possible, but not fun) and your electrical and manpower expenses would probably outstrip the savings.
this is all hypothetical, but you could create disk images and use each client as an iSCSI host, mount each of the servers in your favorite RAIDZ configuration on a network server, and then reshare everything through Samba or even back as pools of iSCSI volumes.
That actually sounds like a pretty cool project, and with enough redundancy, it could be fairly robust.
is a project at the University of Virginia that tries to do exactly what you describe: take unused storage on a bunch of machines and turn it into a file system. http://vcgr.cs.virginia.edu/storage_at_desk/index.html
I've often thought a Napster-like P2P network could be the basis for a fault-tolerant distributed storage system. By "Napster-like" I mean a P2P system with a central index. Add access control and versioning software that can push files from peer to peer. Once a document is on, say 5, peers there is no need to back it up.
/. after all)
Image a system like this:
1. A couple of redundant index servers
2. An integrated versioning system with push capability
3. A large chunk of desktop disk space hidden from the user
4. Appropriate access control at the index level
5. ???
6. Profit! (this is
Unfortunately, some powerful corporations are so terrified of P2P that they're doing all they can to kill it in its infancy.
Well the average Windows install doesn't recognize an EXT3 filesystem (as a for instance, most Linux filesystems aren't "seen" from windows) so partitioning the drive with a windows and linux partition should be fine, then use these drives for multiple backup mirrors via a small linux apache server...
You could secure them with passwords and so on.
Oh go ahead and poke flaming holes in my suggestion *buries face in hands and sobs*
Seven Days with Ubuntu Unity
What if the PCs are switched off?
Even using something akin to RAID, so you store the same data across several machines, you've still got the risk that switching off PCs will cause data to be temporarily unavailable.
Leaving a hundred PCs switched on just to get some extra disk space isn't going to be eco-friendly or cost effective. You can build a several terabyte file server very cheaply these days.
Dan
Wait a minute; if Windows can't see the data, how will it serve the data up to your remote machines? Or are you saying that he should remotely (or on an schedule) reboot the machines into Linux overnight to do this? Because there is no way an OS is going to serve up files from a partition it can't even read.
You know, make fun of Microsoft all you want, but they actually have something for this - DFS - Distributed File System. Just create a share with each of these and POOL IT with a DFS system. Then use and manage it to your hearts content with all the midget-donkey-goatse crap you want.
When I wrote "versioning system" I didn't mean a CVS. I mean software with enough brains to know that a document was edited so it can push the new version to all the peers storing the document.
So if an AC replies to his own post is that an act of brazen cowardice?
The user boxen are for the users, not for you.
The diskspace/CPU cycles/whatever is not idle, it's being kept available for the users' needs.
Don't be such a prick. Pee in your own sandbox.
i'm sure some p2p botnet could use the space
Hrmm... Funny, he didn't come across that way to me at all. You, however, come across as a pompous linguistic Nazi, much like Orwell. If you compose sentences for people who don't have command of the language, then you are really quite delusional.
As is my understanding, resources are utilised, while tools are used. He was correct in its usage.
Was to use a software driver to export the spare part of the disk as an iSCSI (or iATA, if you prefer) target. For performance and integrity, you'd probably be better having a dedicated partition the OS couldn't easily fiddle with, but it shouldn't be too hard to create an array of ~50GB iSCSI targets that you could then collate into larger volumes. Performance wouldn't be stellar, unless you could use a dedicated NIC/VLAN on the hosts, but should be reasonable enough for use a nearline storage of non-critical data that was already archived to tape. But so much for the pros, what about the cons?
The big problems with this idea though are going to be MTBF, storage redundancy and power consumption. You're going to be building your storage array using desktop PC rated HDDs, so lets say an MTBF of 50,000 hours, *but* you have about 100 of them so you should be anticipating a fairly frequent drive failure rate. That means both striping and repeatedly mirroring the data across workstations to ensure that it's always available should a drive or two die - or just be powered off overnight, unless you want all your workstations powered up 24/7 ($$$). You'd also need to be able to dynamically rebuild the data set in the event of a drive failure; but how do you detect a drive failure vs someone simply tripping over the power/network cable - that software's not looking so simple now, is it?
I think it's an interesting idea, but the overheads of maintaining enough copies of each element of data online to survive drives becoming unavailable, intelligently managing the replication of data when a drive is deemed to have failed and not just gone temporarily offline, plus network congestion issues make it non-viable. It'd almost certainly be cheaper and faster to write off the spare HDD capacity in your workstations and buy cheap 1U servers with a couple of GB NICs onboard and cram them full of high capacity SATA drives for storage.
UNIX? They're not even circumcised! Savages!
Microsoft makes an easy to use utility for this EXACT situation called DFS - Distributed File System.
1) Simply make a share on all those machines and POOL them with a DFS server and you are good to go.
2) ????
3) PROFIT!!1!
Great, let's all dumb down to the lowest common denominator. English is a rich language and all the better for it. If you're too lazy to learn it, your choice. I'm a non-native speaker but prefer a vibrant, expressive language to some "for-dummies" international pidgin.
I'm sorry if I haven't offended anyone
Well, you sound like a troll. I seriously doubt anybody misunderstood what he meant because he used the word "utilization". Or, should I say he utilized it? UTILIZE UTILIZE UTILIZE UTILIZE UTILIZE UTILIZE UTILIZE UTILIZE Does it hurt yet?
The solution is obvious. We need to think outside the box and raise the bar when it comes to language... someone needs to step up to the plate and bring something new to the table. I'm thinking of someone I have synergy with, not just the type that goes for the low-hanging fruit.
Ooh.... he's spinning nicely. Another couple of Orwells and we'll have enough electricity to power the world
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Isn't this something Google either has already done, or *should* do? Google Distributed File System... GDFS. It has the added benefit of also being a curse if it goes wrong. Seriously, isn't this an ideal project for Google? And if they've already done it, is it available for implementation by everyone else?
I'd like to see some sort of distributed filesystem as a standard installation option in a linux distribution... The question would be something to the effect of "would you like your computer to find unused disk storage space on your network, and use it for managed redundant storage available across your network?
It likely wouldn't be very fast (imagine RAID 1 or 5 with each disk connected only by ethernet) and the controller on yet another computer also connected only via ethernet, but for a lot of people, absolute speed isn't really required and having all that free space managed in a usable form would make up for the lack of speed.
Well in this case, "utilization"/"utilisation" does mean use. Utilization is actually clearer for non-English-speakers, IMO, because it is always a noun, whereas "use" can also be a verb (perhaps also an adjective). By the way, what is a PHB?
Whoops, should have "googled" this first. Here it is, google file system.
http://labs.google.com/papers/gfs.html
The big questions of course are is it usable by regular people, and is anyone actually working on implementing and including this in any of the major operating systems?
The trick is how, for my machines at home (all 3 of them). I have the first backup to the second and second to the third, and the third to the first. I have thought for some time, that there should be some method of automating that procedure. But keeping track of where things are and which machine has what space would not be easy.
What're you're talking about is not a new concept, it just turns out to be really hard to build in a useful way. The most comprehensive discussion of the problems involved can be found at the Microsoft Research project Farsite.
The short version of the problem is that the level of service you can expect from each system is incredibly variable, so it's hard to offer a meaningful QoS for the system as a whole. It's not quite as bad as the distributed-hash-table problem (a.k.a. P2P file storage), but it's still bad. (Zooko once told me that MojoNation saw an average 50% turnover in nodes in a 24 hour period.) But it's also not as easy as having all your distributed nodes dedicated to just storage, and even that's a really hard problem to solve. (I should know; my company is one of the few vendors doing it.)
Someone else suggested OpenAFS. OpenAFS is fantastic, but not for unreliable server environments. I really don't think there's a complete solution out there, but not for lack of asking.
I'm a troll, and he's modded +5 insightful? Must be a lot of non-English speakers here.
I know little about hardware, so forgive a stupid question: would it make any sense to pull out these computers' drives, replace them with smaller ones, and either sell the lot or assemble them in one place (a RAID?) for easier maintenance? Having your storage spread out through a company becomes a problem if one computer goes down (or is turned off by its user).
I know the cheapness of drives may make this silly.
This is why SAN manufacturers have come up with "thin provisioning". NetApp is quite good it, read more here.
Parent is not troll!
GP is 100% troll.. not insightful.
Modding Trolls +1 inciteful since 1999
What would be a productive use for these terabytes of wasted space?
Well, I had this idea when I read about some Open Source software that allowed distributed storage (sorry, forgot what that was, but by now I am sure it has already been mentioned in this discussion). The idea was this - suppose we have such software for unlimited distributed storage, so that people can download it and volunteer some unused space on their HD for a storage pool. Then suppose we have some software for distributed computing like we have for the SETI program. Now we have ziggabytes of storage and googleplexflops of processing power, what can we do with that? How about, for one thing, storing the entire internet (using compression, of course) on that endless distributed storage, and then running a decentralized, independent internet via P2P software? The distributed database could be constantly updated from the original sources, and the distributed storage then becomes in effect a giant cache that contains the entire internet. Now we could employ the distributed computing software to datamine that cache and we could have searching independent of Google or Yahoo or M$FT. Beyond that we could develop some AI that uses all that computing power and all that data to do... what? - I'm not sure yet. Just thought I would throw this out there to perhaps maybe get stepped on, or who knows, inspire further thought.
I think a better question is define your problem better with some additional details. Do you want a separate drive letter to appear to the customers for them to keep their stuff on? Or do you want something that only you can get to store backups on? What kind of network is it? 100Mb/s? 1 Gb/s?
You'd asked two questions: "What would be a productive use for these terabytes of wasted space? " I don't know if I'd ask the slashdot crowd this.
"Does any software exist that would enable pooling this extra space into one or more large virtual networked drives?" A few. Localhost Azureus http://p2p.cs.mu.oz.au/software/Localhost/faq.html but it hasn't been maintained since 2006. Lustre http://en.wikipedia.org/wiki/Lustre_(file_system)#Networking is a neat read but I don't think is applicable in your situation. It'll give you an idea as to what's out there.
In theory, you could use MRTG to measure your fileserver's switch port to see how much traffic the desktops pull from the server. Divide it by the number of desktops and that tells you on average how much each requests. Now consider that this average would be going to distributed across the network, with each desktop seeing an increase. A Gb LAN may be able to take this with no sweat.
As for how much disk space you are going to practically gain is up for debate. Let's say a 20 Gb quota from each drive. Doing the math , that just under 1.95 Tb. If you ever have to reload a number of those workstations, a good chunk of that is going to be unavailable. You may be better served with a NAS storage device.
"It's one thing to talk about the poetry of machines. Quite another to listen to it for yourself."
The difference is that I wasn't nasty about it, I explained a problem and gave him a link to an essay about it. You, on the other hand, called Orwell and me names, attacked a straw-man, and said something incorrect about the words that is trivially debunked by glancing at a dictionary.
Linux can read FAT32 and NTFS partitions just fine. So yes, perhaps have a vm boot the image at night, mount the windows partition and backup the drive.. shutting down after. Or some custom app that just writes to the ext2 partition. As Bostonsoxfan alluded to, security might be an issue. Encrypting the partition the backups were stored on would probably be sufficient for most places.
Of course the risk of backing up your data on the same physical drive remains. I suppose a VM booting, a secure copy to a peer as well as accepting a copy of the peers backup would address that well enough. Now you'd just need a secure way of choosing the peer (unless you're going to hardcode all the pairs).
I'll believe in corporations having personhood when Texas executes one... - advocate_one
http://www.seanodes.com/
http://www.revstor.com/
Both claim to be able to pool unused storage on desktops and application servers and make it available to hosts on the network.
Yeah, but what happens if the local user needs the space? Does DFS give priority to local storage and move the files? If it has to quickly that could be a pain since the throughput would be poor, right?
Friends help you move. Real friends help you move bodies.
Never forget: 2 + 2 = 5 for extremely large values of 2.
What I would do is set up a large file on each machine, and export it using nbd - I think they do a Windows version.
Then, gather all these NBDs together at the server, using RAID to add massive redundancy to cope with users switching off their machines/crashing/whatever.
Finally, apply strong cryptography (eg. Truecrypt or LUKS) to the RAID volume, so that all the data sent across the network and stored on the machines is unintelligable to anybody except you.
I find it hypocritical and mildly ironic that you use the hyphenate "non-Anglopohone" in criticizing someone else for using unnecessarily complex speech that may not be easily understood by non-native speakers. And, for the record, the performance monitors on my Windows systems tell me the "percentage utilization" of a given resource, not the "percentage use".
If you have a little extra processor time, you could help SETI. I believe they have more data than they can search through. The client that loads SETI also can do a number of other projects, such as folding. The client can be throttled, and set to only run while the machines are not being used, akin to the time you might be running screensavers. http://setiathome.berkeley.edu/ With the extra space, you could always use Clonezilla to back up one machine on another.
https://www.youtube.com/c/BrendaEM
Google hasn't released anything other than papers on GFS and their implementation of MapReduce. At this point, though, I'm not sure it matters since we have Hadoop (which, being mainly Java, C, and a little bash) runs perfectly fine on all of the major operating systems, including Windows.
"Utilization" is a perfectly good word, and perfectly clear in usage and meaning to any educated person. I can't believe that on Slashdot a comment complaining a word was 'too big' would get modded up.
www.3tera.com This makes use of hard drive space similar to the old Novell NSS volumes. It grabs unused sections from machines and turns it into addressable volumes...too bad it only runs on Linux :o)
"My immediate reaction is "WTF? What kind of moron doesn't make things 64-bit safe to begin with?" Linus
Be Aware: I work in this company http://www.maidsafe.net/ which has spent a lot of time and money creating such a system for global use. It is getting close to beta testing now. It is basically a DHT with a self authentication mechanism and much more. Totally distributed network (although a commercial version is in the works). There are patents (11) to protect us (product and system patents, but please it's a whole other argument) and its not yet open source. The reasons are complex but never the less well meant (however arguable). We have over 60 investors (mostly local people) and are pretty happy so far with development, but we do need to make some profits to pay investors back. I own most shares and a foundation is being set up to promote innovation and fund inventors to bring good products to the market for the common good. The system will be FREE and eventually open source when we get some traction, we need as many eyes as possible on the code :-). This is merely stage 1 and others will enhance this I hope to become the network of the future.
There's too much to explain but a visit to the site may help. Public launch should be March / April.
I think what you're saying is we need to leverage a new paradigm in order to take things to the next level. Am I right?
DFS is truly only distributed in the sense you are talking about on Windows 2003R2, in Windows 2003/SP1/SP2 DFS only publishes a link to a single storage and replicates the link instead of using the disk in aggregate. Instead you could do a couple of things if you want to help your organization. 1. You could examine the use of VMWare Server(GSX) or VMWare ESX to consolidate the number of physical boxes essentially freeing up hardware. Hardware that can then be re-used to created Shadow Volume Copy Services. Be aware that walking the VM environment will cause you to carefully plan the amount of memory each server contains and you should not exceed 50% of the total phyiscal memory for any hosted machines. 2. Shadow Volume Copy services will provide for users making bonehead mistakes and with a simple document you can train them how to enable the Volume Shadow Copy on their machines giving them the ability to retrieve past revisions without having to dip into slow backups. 3. You should talk over your concern with your management and discuss any plans so that you have their buy-in. 4. DFS is a definite option if you have the ability to essentially free up a ton of space that will be dedicated strictly to storage and needs to be replicated to other sites (i.e. network installed applications, etc.) To all of you reading this and suggesting Linux Solutions, I love the ideas however, the reality is that not everyone has the freedom to introduce OSS into their environment. I tried and was successful for a short period o time, however, it was deemed to be a non-starter since all of the applications are designed and run on WinBlows. AH to dream......
Why do desktops in a work environment need local hard drives anyway? My Windows folder (created Sunday Nov 10, 2002) is about 4GB. A 4GB SD card is about $30, and a lot of RAM would eliminate the need for a swap file. Basically the only thing that is a bottleneck is the \temp folder and there may be a way to do that with a ramdrive as well. My company requires all user storage to be on a network server, although not really enforced.
The answer, of course, is that there are a lot of business applications that only install themselves on the C: drive and don't play nice without a \temp folder. The standard model PC is a motherboard, RAM, hard disk drive, graphics card and KB/mouse. Add to that Microsoft licensing agreements that discourage virtual machines and other lightweight desktops, remote offices with less than ideal network connections, and "power users" who have real/perceived needs for local storage, laptops, etc. and we can't seem to shake the hard disk.
"Well, good luck finding a judge that doesn't run a bestiality site."
Obviously, this is deal of work so the decision to go forward really lies with how much value is placed in the size and speed and manageability of 8TB storage. With the cost of drives as they are these days, it would probably be more effective to buy 16 500G or 8 1TB drives and achieve the same that way.
Build yourself a huge set of rainbow tables, and show to your users how weak their passwords are :)
Actually after reading the exact meaning of the word utilization it fits the context perfectly.
But what he might have meant by such a choice of words, could be restated as "Not without increase in utilization".
As we all know the queuing theory by heart, everything else becomes redundant in his post.
Please, bare my lousy English. English is my second language and its been nearly a decade from last time someone taught me it.
©God
I completely fail to see how the OP is flamebaiting. I'd rather mod it as interesting if I had any points.
"Waste" the space. It's not worth it. Once you start doing this, the increased load on cheap desktop drives is going to lead to a several percent per year failure rate increase. It's probably not worth your time. If you want to store a few terabytes of data at much higher performance than this, spend a few hundred bucks on two or three modern drives and a SATA multiplexer.
Unless you like re-building machines with dead disks it's just not worth it.
AFAIK, DFS is for consolidating shares on servers, not clients. Which is what this article is asking.
"By the way, what is a PHB?"
It's a Dilbert reference. http://en.wikipedia.org/wiki/Dilbert
"PHB" is the short form of "Pointy Haired Boss".
This sig kills fascists.
Now, I could go and read through Microsoft's webpage about DFS, and speend a few minutes paraphrasing it into a post for your edification; or maybe you could, I don't know, go do it yourself...?
What's purple and commutes? An Abelian grape.
Utilization is actually clearer for non-English-speakers
Perhaps if you have an extensive vocabulary, but for most people this isn't the case. "Use" is a ubiquitous word you learn in the first few months of learning English and you need it in everyday conversation. There is no question anybody even remotely competent with the English language is very familiar with it. "Utilize", on the other hand, is hardly ever used and there's no value in using it when "use" would express the meaning of the sentence just as well.
By the way, what is a PHB?
"Pointy-Haired Boss", it's a term originally from Dilbert that refers to the type of manager that is completely clueless yet says and does stupid things to give the impression that he's smart and knows what he's talking about. If you have an alternative term that expresses that concept, I'd appreciate it. It used to be the case that the term was in wide use on Slashdot and everybody knew what it meant, but now I think perhaps it's fallen out of favour.
And that's bad? I'm sorry we're not all white Anglo-Saxons.
DFS distributes the files, meaning that it copies them to the servers that are in the link. This doesn't sound like it is going to be exactly what he is looking for unless he wants to be limted in every share.
If you think of it this way:
Server 1: 40 GB Space
Server 2: 30 GB Space
Server 3: 50 GB Space
Well your DFS Root could combine this space theoretically into one root, but each share would technically only have the space that is available on that server and not one big pool. So each share would be able to look like this:
Share 1: 40 GB Space
Share 2: 30 GB Space
Share 3: 50 GB Space
Accessing it would be easy though:
\\dfsroot\Share 1
\\dfsroot\Share 2
\\dfsroot\Share 3
Also DFS is used a lot of the time for replication. So find your lowest common denominator and you can replicate all of that data across all of your servers!
I was always under the impression that use and utilize were synonymous. Maybe you are right that utilize is unnecessary; maybe the Orwell article explains, but I'm not going to read it. For one, that's the great thing about English (and I guess languages in general), you have choice. I'll use whichever one I damn please. Also, language evolves so deal with it. Finally, you want us to dumb down our speech for non-native speakers? That's absurd. I understand acquiring and maintaining multiple languages is difficult. I'm pretty much a failure in that respect, but this isn't a language learning site. People should be free to write however they want. I wouldn't care if people write in non-English (not sure if that is allowed by the rules or not, but it should be).
Utilization is a well-defined technical term. http://en.wikipedia.org/wiki/Utilization
As someone who designs storage systems for a fair sized business, this is an impractical use of resources for a number of reasons. First, as has been pointed out, you are competing for other resources. Storage on clients is highly variable. Just because somebody has 20GB of free space today doesn't mean he's not going to go out and download download a couple DVD's worth of data. Your system would have to take into account this possibility, and be prepared to cope with a space issue on the disk in real time. This would usually mean self-deleting when free space is too low, since you can't anticipate that there will be enough time to move the data over the network before the disk runs out of space and your clever scheme has now impacted the productivity of someone else. This is a less likely scenario in a server farm, but still a consideration. Further, in a server environment, you've seriously got to consider the function of each device before you go farming out spare resources. If you've got a web server with highly static data and overall low utilization, then this would actually be a pretty good candidate to participate in a distributed file system. However, a DB server, not so much. Some applications also desire a certain amount of free space on their volumes to ensure optimal efficiency. You certainly don't want to do this on any system wehre disk I/O is a major factor in the performance of an application. What this really comes down to is a simple cost/benefit analysis. Before you decide if this is a good idea or not, you need to establish that there's actually a value to the business. Answering these questions will help: Is there a business need to provide additional storage at this time? What is it? How much storage do you need? What is you current overall efficiency on storage today? Given the technologies available to create a distributed file system, what is a practical amount of usable space you can gain by doing this? Does this amount meet the stated need? What is the approximate cost in time and cash spent to implement? What is the management overhead of such a system in hours over a period of time? What are the risks of doing this? Do the risks outweigh the benefits? What is the cost (upfront and ongoing in cash and labor) of alternative solutions that meet the need (like a new file server)? Once you answer those questions, you'll have a pretty good idea of whether or not this is something you should even be considering. As a trend, the problem you describe is exactly why people are so enamored with virtualization. In a server environment, virtualization makes absolute sense. The overall efficiency of virtual servers is an order of magnitude (or more) higher than most physical server farms. Desktops are a different picture. The same principles apply, and vendors are purshing virtual desktops. The problem is that since the advent of the PC end users have never preferred thin clients. With the majority of PC purchases now being laptops, it's even more so than it was in the past. Users want to have their data close, and to be able to take it with them.
Of course he's hypcritical - the guy's a skilled troll and everyone is biting. Please ignore him.
There is a fine line between being a cultivated citizen and being someone else's crop. - A. J. Patrick Liszkie
And then there's "full" as in "Hey, I can cram more crap into that box!"
I need some of that space to defrag the HDDs on my windows box.
Now, if only there was some filesystem whose performance didn't degrade over time due to fragmentation...
"Reality is that which, when you stop believing in it, it doesn't go away." - Philip K. Dick
No, language is a tool.
True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
People (especially PHBs) think a resource that is not 100% utilized is going to waste.
However, consider which is more efficient: extra unused disk space, that is spare capacity with no strings attached ... or, an elaborate pooling and sharing mechanism with much more management overhead and electricity use?
It's so true, disks are incredibly cheap these days. The unused space costs nothing (except if you could have gotten away with an even cheaper, smaller disk.)
Look at the amount of redundancy and resources that would be required if you were going to try to pool those PCs' storage together into some sort of hive:
- How much redundancy is needed so that random computers can be powered down, or crashed, and the storage pool doesn't go offline? Even if most are left on at night, would you need a redundancy factor of 3 or more?
- Electricy use goes up / the bill goes up, if you have to require most PCs to stay on to utilize this "free/spare" capacity
- Do things like Service Pack updates or other patches that get pushed from a central server become much more complicated now because you cannot reboot more than a couple computers at once without the hive store going offline?
If you need a large pool of storage and a way to manage it centrally, use SAN or even NAS and consider VMs.OTOH, if you want to keep it simple, bask in the cheapness of disks and high powered PCs these days and don't fret about "unused" capacity.
"Hey Albert, Good luck exploring the infinite abyss."
I have not worked with either product, but an eWeek article about new storage technologies mentioned Seanodes' Exanodes and RevStor's SANware.
The double irony of making the central joke of your post the tired old "Orwell spinning in his grave" is delicious.
<xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
Users trying to be productive at their workstation really don't need additional slowdowns happening because someone elsewhere is accessing their hard-drive.
Not so long ago MS did an upgrade that brought my system to a halt in my usability of the system as it was using all additional drive space to cache my system, on my system.
When it stopped even teh cache didn't show this usage but I was able to determine it was all in the cache and had to learn about clearing teh cache, not by delete but by ctrl-del to reinitialze the cach and of course rebooting.
I was not a unfortunate as many others in the amount of drive space this took up as I only have around 34gigs total on that partition. Others, I have learned, have lost as much as over 100gig.
The point is that on windows, disk access can slow a system down quite a bit.
If you really need additional drive space to figure out what to productively do with it, terabyte drives are rather inexpensive today. But trying to make use of available drive space on user systems on a network is very anti-productive, not to mention how it will slow the network down too.
Just because it has become a well-defined technical term does not stop it being a bastardisation of the English language. I actually read the essay that the AC linked to to find that Orwell makes some good points. In particular the term utilization is one of a number of phrases used to dress up a meaning and attempt to make it sound more impressive and scientific. Just because the bad authors who do this have succeeded in making the term utilization standard does not change the AC's point in any way.
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Take a look at F5 Acopia. Create shares on all your servers with extra space, ensure you set the permissions and quota correctly, so as not to affect the core function of the server, and then use Acopia's product to merge all the servers into a single name space and present it to clients. Much like DFS, but it will also do NFS as well as CIFS, and has a very flexible policy engine to allow you to live migrate files and load balance between many servers. You'll be able to claw back all the unused space and any standard NAS client can make use of it for whatever application needs it, with no special client software required.
You have completely misunderstood the AC's point. Nothing that you have written adds to this discussion in any way. I can believe that your comment would get posted on slashdot.
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Running DFS (to serve files) on Windows XP clients? What are you smoking?
From Microsoft TechNet:
The servers that will participate in DFS Replication must run Windows Server 2003 R2.
It is possible to use DFS Namespaces when domain controllers and namespace servers run a mix of Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 without SP1, and Windows 2000 Server, but some functionality is disabled or available inconsistently, depending on the operating systems on the servers.
From: http://technet2.microsoft.com/WindowsServer/en/library/1aa249c0-40f3-4974-b67f-e650b602415e1033.mspx?mfr=true
Once you've got that set up, use a tool like this: http://projectdistributor.net/Releases/Release.aspx?releaseId=404 And fill it with random dummy data ;)
:(){
Actually, for a french speaker, "utilisation" is very easy to understand, as it is a french-word...
Couldn't you use something like a localized version of freenet+samba to do this?
It would allow the local drive connections to not necessarily see what anybody else is storing on the nodes AND it can be locally throttled to keep from interfering with local apps.
(I'm sure this post will now garner a bunch of "Only if he wanted store warez and kiddie porn on it!" replies.)
"Bah!" - Dogbert
You might want to switch the work place pcs off occasionally and then you don't have any access to the data. If you want to have people work over the weekend you must switch all the machines on. This is a more energy inefficient solution than having a large file server and Flash drives in your work PC for instance.
Je me souviens.
It's just storing random data.
ccalam - acoustic versions of new songs.
gun people are creepy... technophile gun people are super creepy.
In order for all of the interesting ideas in here to work the computer needs to be on. Now thats fine as long as you only need access to these distributed storage bits during office hours. For 24/7 efficiency and reasonable performance all the networked computers would need to be powered and online always. The power costs would more than double from say 1.5KwH for 10Hours (assuming a power consumption of 150W per computer) to roughly 3.5kWh per day. Assuming a power cost of 15 cents US, thats 37.5 cents a day per computer: 12$, per month. Probably cheaper to lease an additional storage server pr an offsite backup in the long run, not to mention that the extra power is mostly wasted and unneccesarily contributes to pollution...
Now the world has gone to bed, Darkness won't engulf my head, I can see by infra-red, How I hate the night.
I run a large network with thousands of computers, and I recently noticed that there are thousands of keyboards sitting there unused most of the time. And even when they are being used, typically only one or two keys are being depressed, leaving over 100 keys unpressed. Can anyone smart think of a way to put those hundreds of thousands of underutilized keys to work? It just seems like such a waste.
.sig withheld by request
hadoops's hdfs is the only thing which comes to my mind but i don't know if this could be any useful at all.
http://wiki.apache.org/hadoop/DFS
another project is wuala ( http://wua.la/ ), but that's not for internal use...
WoW... it appears you Penguins are just 'reinventing the Microsoft Wheel' (same w/ ZFS fans really) - Microsoft's already been there, & DONE that, & it works.
Imagine SQLServer 2005 blazing away on a Distributed Namespace, spreading it db-devices across 100's/1000's (whatever) of systems, using their idle time for it, & diskdrive read-write heads + RAM & CPU, etc. et al + using a high-speed interconnect, & maybe toss in a few dozen Solid State Drives (placing critical devices onto them, for the clients that use those tables/files/devices the most, you place them locally onto THEIR machine node, etc.), well...
YOU GET THE PICTURE!
So... Hey Penguins, new NEWS:
"It's been DONE (& works + is called DFS NameSpaces)"
By Microsoft, already.
Sure, it would take a while to encrypt some data, but that's a side matter :)
It's all encrypted, and since it's a LAN, it'll be fast. It also has the redundancy (and your end-users won't even know what's on their computer)
Plus, it'll be a great example of a "legitimate use" for it (unless you're just looking for a place to store your child porn). Kinda overkill tho, I'm sure there are other distribute filesystems that make better use of the resources.
--
Stay tuned for some shock and awe coming right up after this messages!
Your plan still has the problem that the computers would have to be online. If it's not ON, then you can't do all this stuff. I guess you could use some sort of Wake-on-LAN system, and wake up all the PC's in the place at once, however, there still remains the problem of workstations that are out of service - say a computer has to go in for repairs, or its hard drive fails or something like that, then that data wouldn't be available. With regular servers, it's not that big of a deal, since it's easy to backup and monitor servers; when you start using regular workstations for data storage, you run into problems.
The video itself is fine, but the advertising is not!
Look at the tomato! Isn't it sad? He can't dance! Poor tomato!
Indeed, he should have used "Orwell being in the state of having a large angular momentum in his ultimate disposal place" instead.
The Tao of math: The numbers you can count are not the real numbers.
Not believing it would indeed be hard, given that it can be easily seen (and you obviously did see it, or you wouldn't have replied to it).
The Tao of math: The numbers you can count are not the real numbers.
AFAIK DSF will not suit this. But MS is active on advanced distributed file systems like Farsite http://research.microsoft.com/Farsite/faq.aspx Unfortunately is does not seem to be publicly available.
Anssi Porttikivi / app@iki.fi
http://www.cleversafe.org/ It's open source, dispersed storage, encrypted, redundant... seems like it's worth giving a try. I haven't used it personally but it had been around a while. The more machines using it, the better a solution it is from what I can tell. The windows support may be the big question... but the project seems worth keeping an eye on.
DFS doesn't actually allow you to pool your disparate storage. It acts as a generic namespace that allows you to have multiple replicas of the same data, and keep your users from actually knowing where the stuff is kept.
Democrats and Republicans are like AIDS and Cancer, I want neither!
Chirp, a distributed file system which gives you unix-esque access to files and doesn't require administrative privileges to set up would be pretty much perfect for pooling you free hard drive space.
They get significantly less creepy once you've had somebody point a gun at you and pull the trigger. If you survive the experience and you've half a brain, you'll start learning a few things about guns. Those creepy technophiles are the best place to start.
The higher the technology, the sharper that two-edged sword.
How exactly is an in-depth knowledge about the inner workings of a gun going to help me if someone points one on me? If I myself have a gun, the only necessary gun knowledge is how to use it. Otherwise, I don't see how gun knowledge will help me. Or did you maybe think about talking with him about his gun, a la "Oh, nice gun you have, did you know that ...", hoping that he finds that conversation interesting and therefore decides not to shoot you?
The Tao of math: The numbers you can count are not the real numbers.
At this time I am able to answer your query in the affirmative.
-1 not first post
Am I supposed to take your word for that? :)
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Doesn't it show that potentially better organisation is to have diskless machines with remote storage to allow the massively redundant storage arrangement.
I do wish people would use Google or Wikipedia for such simple questions...
See the list of filesystems listed under "Distributed fault tolerant file systems", "Distributed parallel file systems" and "Distributed parallel fault tolerant file systems" here.
ah, the artificial barriers of windows strikes again...
comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
Oh come on, that was a joke people.
You forget that Orwell later rebutted this opinion .
Since the Romans invaded Britain, English speakers have used latinate phrasing to appear scholarly. Anglo Saxon words were short and pithy, like "home", "pig", "horse", "cat". But scholars learn latin, so it's "domicile", "porcine", "equine", "feline". In modern English, the choice gives you a palette of moods - like colors on a web page.
Hardly a Microsoft invention. Of course, one of the two I mentioned might actually run on XP -- or you could switch all the client OSes to virtual machines and run Coda on Linux.
Don't thank God, thank a doctor!
today's "businessspeak" (mindless repetition of words and phrases that have long since been driven into the ground by thoughtless, banal, stupid repetition)
Kids! That word, meaning "trite" or "unoriginal", is pronounced "ba-NAHL". If you say it the wrong way like I did in an interview, it sounds naughty and you sound stupid.
Detailed knowledge of any technological artifact will make you better at using it, maintaining it, knowing when to use it, whether it's an automobile or an AK-47. Yes, some people find guns interesting to a greater degree than others (I don't, personally, nor do I own one) but whatever floats one's boat. Let me ask: do you find someone that has an advanced knowledge of computers creepy? Probably not, if you're on Slashdot ... but there are many that do, until they need him.
When the time comes that I need a brain to pick, it's those "creepy" nerd types that I seek out. They're the ones most like to be able to help. Maybe you're anti-gun, and the fact that some people are not is offensive to you, I don't know. Regardless, you should look at people who know much more than you about a given subject as a potentially valuable resource, not an object of scorn.
The higher the technology, the sharper that two-edged sword.
Utilization is a standard term in computer and technical environments. It means something specific to most of us on Slashdot.
Possibly you mistook this forum for a different one?
just install freenet and make your entire organization trusted with itself. Expand the local store by several GB.
---- Booth was a patriot ----
Just post the addresses to your servers here, leave a few ports open and some industrious person is certain to put your system to good use.
Have gnu, will travel.
If you have lots of bandwidth aswell, set up file hosting site similar to TubeShare. Atleast you shouldnt run out of space :)
The best (and most entertaining) essay/book I've read on this topic is Less than Words Can Say ... it makes a compelling argument for clear, direct and precise language usage. I wholly recommend it (it's free online). Most importantly, as some of the responders to your post have failed to realise, there is a very big difference between "dumbing down" your language use, and making it clearer. Frivolous excess 'business' or bureaucratic verbiage usually *is* actually dumbing down the language in a different way as it makes meaning more opaque, even while giving the superficial appearance of intelligence and insight. Learning to recognize the difference is so critical our future actually depends on it.
Are tools not also resources?
The problem is that people often use big words without knowing what they really mean. In this case, the GP was correct in his use of "utilization", but the AC reply made the erroneous conclusion that the use was spurious (there's another great word). I agree with the spirit of the AC's reply (if not the particulars of this instance).
-ben
myselfmusic
Indeed, that's why correct language utilization is important :)
Well, actually, this may be useful as backup space. I'm working in a small company with a small server. Right now, we're regularly doing backups on tapes. But actually it would be quite easy to mirror the few GB we have on this server on one or several desktop computers. Wouldn't it? This should be possible with rsync. Is there any similar tool for Windows? Simon
Excellent. Have your people call my people and we'll discuss the potential ramifications of these developments over sushi.
Yet the "necessary gun knowledge" in using it involves many things which do require you to understand the rudimentary "inner workings" of a gun.
What happens if a gun jams? You will need to know how to clean it. What happens if you know the firing pin struck the cap but the round hasn't gone off? Open the chamber to clear the round and it might explode in your face. Sure, these things probably only happen 0.5% of the time but just *one* occasion of bad luck is enough to snuff your life out.
You really do not have the time to skip and pick up knowledge like this only when problem comes. It takes more than just pulling the trigger to really use a gun.
A horse can't be sick, you know, even if he wants to.
Reed Solomon error correcting codes are a good example.
Reed Solomon codes are of the form (N,M), where there are N code words of which M are data and (N-M) are parity or error correction words. Reed Solomon codes can correct (N-M)/2 errors if the position of the errors is unknown, e.g. out of N words it is not known which are in error, but Reed Solomon codes can correct all (N-M) errors if the position of the error words are known, as is the case in a distributed storage system (assuming each device stores data perfectly, which can be enforced by another level of redundancy on each device). If there are N storage devices out of which a maximum of E can be lost at any one time, an (N,N-E) Reed Solomon code will provide perfect data integrity. All data is encoded as an N word code, one word of each is stored on each of the N devices. If a storage device is lost, a new one can be added and all the words that were stored on the failed device can be rebuilt from existing data on the new device. RAID5 is just an implementation of a (N,N-1) Reed Solomon code, and many RAID6 implementations are (N,N-2) Reed Solomon codes.
The speed of performing Reed Solomon encoding and error recovery in general is not as fast as RAID5 and RAID6, because the implementations operate on polynomials over GF(2) which don't have direct hardware support in most processors like XOR does for RAID5. From what I've found, the time to calculate parity can be made to scale roughly linearly with the number of parity words being calculated, but that relies on using 8 bit words and pre-calculated multiplication tables for each data word which scales quadratically in space. With 16 parity words the coding and error correction can take about 16 cycles per byte of data, and with 32 parity words the speed drops to 30 cycles per byte. That means a theoretical maximum of several dozen MB/s on modern processors. If each storage device can calculate its own parity information, the speed could probably be increased significantly for very large storage arrays.
Aside from the original story's idea of using free hard disk space of workstations, I think it would be cool for people to be able to group their machines together on the Internet using the above scheme so that each user could donate X bytes of storage to the common pool, and once an acceptable level of redundancy (an (N,M) Reed Solomon code) was decided upon each user would have X*(M/N) bytes of highly redundant storage available. I hope to write either a FUSE driver or network block device driver that implements this for Linux. If anyone's interested, I'm planning on making everything available under either BSD or LGPL licensing when it's finished.
Agreed, that would be another hurdle. The solution that immediately comes to mind is a freenet style clustering. That would present more issues with versioning and redundancy. Maybe those have already been addressed elsewhere?
I'll believe in corporations having personhood when Texas executes one... - advocate_one
Not a replication DFS, but a namespace DFS.
Create a "stand alone root", NOT a domain root, on a server.
Then add links to it, where they point to shares on workstations.
This acts more like a bunch of symbolic links to the various boxes, with one entry point the the share. Not the same data everywhere, like a domain root would be.
http://www.microsoft.com/windowsserver2003/technologies/storage/dfs/default.mspx
Or maybe its utilization?
ZFS ftw http://www.cuddletech.com/blog/pivot/entry.php?id=729
Way to jump to conclusions about me and how I manage a network. I honestly didn't ask the question as a "control freak", I don't spy on the employees, and I don't play Internet cop. I try to get them the tools they need to do their jobs, help them when things don't work, and otherwise stay out of their way. I also didn't imply the pool would be for me to do with as I please; I can see several ways in which that storage would benefit our business were it not spread out in small chunks. The users have all that space, and they simply DO NOT use it. In our business, they don't have much call for large files like photos, movies, etc. It's mostly spreadsheets and OpenOffice Writer documents. But thanks for being an ass.
BOINC, Tor, Freenet and/or I2P are good examples of things you can put your extra resources to some use. Here are the BOINC projects I would run if I had 100's of system's at my disposal.
Artificial Intelligence System, NanoHive@Home, Predictor@Home, Project TANPAKU, Spinhenge@Home, The Lattice Project, World Community Grid, SIMAP, Malaria Control, Proteins@Home and Rosetta@Home.
Having tried this in college, I can tell you a couple things.
1. You will noticeably reduce the lifespan of the discs. (Which can anger cost conscious supervisors)
2. Doing ongoing hardware maintenance, because of this reduced lifespan, on closed, used by others, boxes is a *serious* pain.
Storage setups make hot swapping discs easy, trying to do this with full blown systems just gets tiresome. The solution I eventually came up with was the following.
Implement a two tiered hardware replacement cycle where you reduce the time a user is allowed to keep any hard drive in their box before replacement. Then using the still reasonably good drives, create a centralized storage solution in which the drives can live out the rest of their useful spans. Data security, user happiness, and redundancy are all good selling points of this system. You still have to deal with monkeying around in user boxes but if it's on a schedule and it nets you more drives, it's not so bad.
-Ian
+3 insightful? For this shite? Jesus wept.
Those terabytes are for porn!
Those terabytes are for porn!
Why'd you think hard drives were born?
Porn, porn, porn.
It's probably not there yet for you, but you might want to keep an eye on AllMyData's Tahoe project.
You make it sound like it's a bad policy keeping all business data somewhere properly managed. It won't mitigate any damage done to your company or your career because you told them to be careful. People will store data in the most convenient location, thats not stupidity - just human nature.
Mango pooling is the biggest idea we've seen since network computers
The following is an excerpt from an InfoWorld article from January 12, 1998.
Mango, in Westborough, Mass., is not your average software start-up. In 30 months the company has raised $30 million. Its first product, Medley97, has shipped, transparently "pooling" workgroup storage.
I digress.
Across a bunch of user machines? Sounds really unreliable. I mean you wouldn't give users a reset button for your servers. It sounds like the guy just wants to use this space for the hell of it. In a business context either you need more storage space so you buy the hardware, or you don't. I think this proposal stems from geek mentality to find it abhorrent when a potential resource doesn't have a use, rather than a sound business reason.
Rebutted? Do you mean recanted? I never heard that. Do you have a reference?
who said it was windows xp? obviously we're assuming its windows server, so what are you smoking?
and it doesn't matter what type of a client you have to access DFS either, that's just applicable to the clients. but i'm not sure DFS even makes sense here, the intended use is to distribute files, not make one large drive.
-mr silver
I won't bother going through comments to see if this has been posted.
This story has some interesting comments from a guy who claims he's CEO of digitalbazaar.com, a company that created a distributed filesystem named Starfish.
Open source and cross platform.
AC because I can't remember my freaking password.
You know, I'm no tree-hugging environmentalist and I'm as guilty as the next guy of buying all kinds of stuff I don't really need, but even I realize there comes a point when "buying more" isn't the best answer. If we could use all this space, maybe I wouldn't need to buy a fancy storage array. And the power to run it. And the drives to stick in it. And the place to put it. And all the garbage generated when they built it. And the room it's gonna take in the landfill when its time is done.
What if I said, "I have a found way to magically extract all the gasoline sitting in gas tanks in junked cars, and now I can give every one with a car a free tank of gas." You'd surely raise your hand to get your tank filled. Sure, doing so is nearly impossible and completely impractical. But in the case of pooling unused desktop PC storage to use on the network, I know it's far from impossible, and with the right software, could even be practical. Otherwise I wouldn't have asked the question.
Yes, but Lotus Notes will also make baby Jesus and 300,000 IBM employees cry.
Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
Gary Larson is turning in his grave ...
Caesar si viveret, ad remum dareris.
You've toolized the language!
Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
Unfortunately you misunderstand DFS. You cannot pool multiple targets in the way you intend. DFS is to provide a unified namespace for disparate locations. DFS-R can be used to keep multiple targets in sync with the same data.
First, create a uniquely named empty directory on each drive. Next, set up a Linux :
file server running Samba to be used as a proxy to access the distributed storage
on the Windows machines. Finally create a union of all the empty directories using UnionFS
http://www.fsl.cs.sunysb.edu/project-unionfs.html/
Problem solved.
jdb2
I didn't read every single reply, but I have a similar situation. About 60 PCs, each with 300-500GB of hard drive capacity. but only 5-20GB used. All connected by a local CAT6/gbit network. I'm using Veritas BackupExec for our disaster recovery solution. I've setup shares on each box, with restricted permissions, and setup "Backup-to-Disk-Folders" (feature in veritas) on the server that point to each remote host. You can set a max size (how much to use) and a cushion (how much to leave free) Create a pool containing all the BTDFs, and run your backups to them. The great part is if a box is offline, or full, it will just dump over to another one. You should use tapes regularly as well. If you lose one box you could lose everything. But it sure gives the tapes a break, and its nice to know that space is being used.
-Mark
Well the name is obviously a tribute, microsoftphobia aside.
Anssi Porttikivi / app@iki.fi
Project Celeste is basically what the OP is talking about. It's a distributed filesystem with automatic replication, handles rogue nodes via voting and also exports the "filesystem" as CIFS. It's essentially a distributed object store, which can be used to implement a filesystem on top of it. I saw a demo of it last year and I was pretty surprised, it seems to work quite well for a research project.
Jean-Francois Im's blog
The above-average Windows user's first question would be: how the heck do you install Windows (XP Pro) on something that is NOT a local disk? Excluding the odd case of adding a "vendor floppy with a driver" that emulates a block device (think NBD in Linux), the installer does not seem to know about any networked filesystems. And the Windows boot process does not seem like it could be running as a diskless environment either. (I welcome pointers to the contrary, and ideally, free solutions.)
DFS was not even remotely created by MS or vista. its similar to AFS except there is more than one, more robust than nfs, and unix based. But sure go ahead and try to give microsoft credit for it.. this is the most uninformative thing I have read all year. Windows XP can also mount NFS, AFS, and DFS drives... as can unix and linux and osX.
"Jazz isn't dead, it just smells funny" ~Frank Zappa
EdelFactor
Don't you worry about BLANK, let ME worry about blank!
I drink to make other people interesting!
PC hard drives don't have long lives under heavy load, if you started using them more often your failure rate would go up considerably.
And considering how cheap these cheap drives are, it's really not worth the effort.
This sig is the express property of someone.
I used BitTorrent to distribute virtual machines across a LAN.
Not only does this make use of extra space on the hard drives,
also the network and spare CPU cycles of every PC available to distribute the files.
VM's can be huge and not easily restored or cleaned after use.
Point to Point copies of a VM can take a very long time and are not guaranteed to be 100% CRC free.
A "restored VM" or a torrent that has been restarted on a file that has been used
or changed can be reverted very quickly back to it original crc/hash and is 100% perfect.
Essentially I was able to take a job that would take 2-3 hours to complete down to about 35-40 Minutes.
The job required copying multiple 8-12GB VM's accross 20 machines with 100% accuracy
Point to point copies are not 100% perfect.
A Torrent copy is.
I cant tell you how important this is for a student who needs their VM to work 100%
and not crash in the middle of a week becuase there was an unknown error resulting from a strait P2P download.
In addition a "restoration" of the VM would be a simple restart of the Torrent download.
This would only take about 10 minutes.
So if a student made a mistake, restoring that VM would only take minutes and not require replacing the whole VM with a new one.
This is becuase all it does first is a crc/hash check
and then only downloads the bits needed to restore the file to its original state.
Its an instantly restored VM.
No FTP, No Media, No open shares.
Just a simple Torrent file server with a HTML page for all the torrents.
The clients are their own trackers.
Just an example.
I do not know exactly how this could be applied to other uses.
Don't forget that at those sizes, a .45 is nearly 30% larger in diameter, and has far more mass. A 9mm will normally have a 124 grain bullet with a velocity of 1150 ft/s, 364 foot-pounds of energy. A .45 can be shooting 230 grain rounds at 900ft/s for 414 ft-lbs of energy.
.45 was designed for FMJ ammunition from the outset. The larger and slower .45 round will use more of it's energy in a body, causing more damage. A 9mm HP will out stop a .45FMJ - but US soldiers are forbidden expanding ammunition. A .45HP will stop more often than a .45FMJ, but the difference is nowhere near as large as the difference between a 9mm HP & FMJ.
.223/5.56 round our military uses in most of it's rifles. 1300 ft-pds of energy in a 60-70 grain bullet traveling at over 3k ft/s. Sufficient velocity that the round will often fragment when it strikes a target.
Despite all this, I think that when it comes down to the army, it's mostly because of ammunition selection. Troops are issued non-expanding FMJ ammunition, which leads to 9mm over penetrating and under performing. The 1911, chambered in
As for the rifle comment, I have to agree. Consider the 'poodle-shooter', the
I don't read AC A human right
Seriously, no confidential data or even users data can be stored in such a system. It would make stealing data very easy, since every computer on fx a campus would contain a portion of the whole. Now if it was a secure corporate building, it could be feasible if you trusted every employee, but who does?
Nah, rip out all the disks, and put them in a datacenter. Then run thin clients on the computers...
Anonymous Coward
I manage a network for a small medical office in my spare time. They use a LinkSys NFS box for their two networked databases, and have about ten overpowered workstations around the clinic for office work and instrument applications. Lots of extra disk space.
I made them buy the "pro" edition of xxcopy (for the network support) and wrote scheduled backup scripts for the NFS. Xxcopy has reasonably fine-grained file comparison, so I can back up changed files without taxing the drives too much. I made a separate admin account on each machine just to run the backups, and scheduled the scripts under the new accounts. It ain't cron, but it does the trick.
The backups run once at noon, once at night, and there's also a snapshot of the NFS taken every Friday. As a result there are about 24 backups of the working databases at any given time.
This way they didn't have to buy a dedicated backup system, and they get much more redundancy than a single backup solution would provide.
The poster asked how to use the wasted space on all the Desktops in his business by pooling them as one big hard drive. So yes, we are in fact looking for ways to make 1 big hard drive, not just share files, and yes, we're pretty sure he's not running a Windows Server Family Operating System (tm).
So you can count DFS as a big NOGO.
"Not to mention all the idiots who use words like boxen."
Anonymous Coward on Monday August 04, @06:49PM
Please do not use the space for anything else. Do not try to actively use the space.
The reason is the obscenely large amount of power required to use the space given a few gigabytes requires the whole machine to be running, and uses it's CPU which can't be less than 21Watts itself.
It's actually cheaper to get a 1TB drive and use it elsewhere than use the power on so many desktops (or worse, servers). Even with the desktops in use by active users.
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
limitations?
And, if you're claiming some kind of market race, you might want to check for relevant dates concerning ZFS
Of course, if you're just trolling, ignore me.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
The question is, what kind of data would you want to store in a file system where huge chunks of it are likely to be unavailable or just gone at any time?
I'm thinking, backups.
Let your desktops provide redundant backups of all your other desktops. Each night, the computers that are still up, would each make multiple copies of themselves on several of their neighbors... copies of everything but their backup directories and those system files that Microsoft makes it unreasonably hard to backup. A copy of the registry, a copy of the profiles, copies of installed programs, copies of all those files that make this system different from that one. Each would select systems that didn't have recent backup copies of themselves, and then at the end of the night they would prune the least useful backups... say, three redundant week-old backups of one of them, and a month-old backup of another... and report to the master control program what they had done.
Now if you lose your system, you bring up a standard install during the day, log in, and it will become your computer, find its most recent backup, copy itself back from its helpful neighbor and decrypt itself with your password, and by the time you get back from the morning staff meeting all will be well.
I donate this idea to the public domain.
Tools are resources. We've come full circle here, folks!
Wow, fitting username.
What's wrong with just leaving the disks as they are for heaven's sake?
Use Torrents to distribute large files across the corporate network: - administrator sets up a torrent tracker server, and a torrent client on each PC; - administrator seeds the file; each client that needs the file downloads it, getting faster as more peers come online; - it needs some admin tools to keep the clients going: cleanup of old files if disk gets too full. (Feature request? Tracker could tell client which of its hosted files are least in demand?)
(this is not a
Has anyone ever tried or come across GFarmFS? I literally stumbled across their page by accident by I've read all the documentation I can get on it, I'm interested in implementing it myself. It seems to offer most if not all of the feature you want, maybe it's worth a look.
that is cryptographically secure, secure, and GPL'd?
I'd go on a Vegan diet but the delivery time from Vega is too long. --brownkitty
Mem: 1027696k total, 919316k used, 108380k free, 31248k buffers
Swap: 2048248k total, 0k used, 2048248k free, 554324k cached
A typical system with Opensuse 10.2 and KDE might have the memory usage as above. At the moment there is no need for swap and hence no need for hard drive once the OS and apps are loaded from the network. Alll applications will fit within the main memory, if you pick the applications that are not bloated. For workplace, diskless workstations are the way to go. Store all data centrally for simpler management. Set up a gigabit ethernet network.
Take all the hard drives out and run everything from a server using two or three of them. You'll save a few kilowatts.
If you really need local storage put some solid-state drives in.
Seagate makes a storage product that works on top of DFS called "Storage X" that lets you do this... you can take a whole bunch of servers and combine them.
Well, that's what their literature says... I have no idea about the real product.
Yes Francis, the world has gone crazy.
If you have rsync running on each one at different intervals and use hard links to create efficient differential backups (yes - it is possible in windows), then you could have masses of copies of your data stored between them all. That way you could spend less on backing up your file server (you will still need some off-site backup), and probably have better recovery options for lost files.
I guess FreeLoader answers your call. Freeloader is a distributed storage system designed to harness the unused storage space of LAN-connected commodity desktops. A dedicated space manager maintains metadata such as node status, chunk distribution, and files attributes. The FreeLoader project is under active development. And can be found through the following link http://www.ece.ubc.ca/~samera/projects/freeloader/
Yahoo is developing something called Hadoop on Demand which might work. Hadoop is the Amazon clone of Google's GFS (Google File System). Hadoop on demand is supposed to allow you to use unused volumes on any machine to create an ad hoc hadoop cluster. I'm not sure if it's been released yet, or if it works with Windows. But it would be cool it it did.
Artificial barriers? See Hanlon's razor.
The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.
I recently saw that fink is looking for mirror space. Donate to them?
Space and Computers.
All 19 hijackers were known terrorists 09-10-2001. Lack of FBI intelligence does not justify warrantless wiretaps..
You could put a Hadoop Distributed File System (HDFS) on them. HDFS allows you to use the storage as a single file system that is stable and reliable. We have multiple 2000 node clusters with petabytes of user data on them. Because the blocks are each replicated to 3 hosts, if a node goes down, your data on that node is not lost.
Haven't heard of the existence of it yet, but been looking for just the same thing... iirc though, google's fs sounds almost close on the money for what we'd want and its a shame we cant get it. There are alternatives like afs, coda, etc (although, i find some suggestions of dfs people have been making quite strange). But at the end of the day what you really need is something like:
1) the ability to spawn off say 10gb on users computers all over the place (maybe 100's to 1000's of machines)
2) machines can be on or off at any time
3) data must not be lost because a machine went off (or at least, data must be kept in tact if a machine is switched off)
This is not even close to what dfs is capable of, nor would managing 100's to 1000's of machines with it (even if it were capable on xp/vista) be even remotely manageable. Likewise AFS and Coda dont really lend themselves well to the job. I started working on something like this where a program running on a machine would grab 10% of the free space, create a "disk" file, then talk to some central computer telling it that it was available for use. I never really got far though because theres alot of exception handling to do (and meta data to track). At the end of the day you cant guarantee the availability of any one "block" because unless you mirror to every computer then you might be able to guarantee 1 machine is on at any one time. Unfortunately, that approach is almost pointless because the data might as well just sit on 2 physical volumes on a storage array rather then across all your machines.
Which is not to say its a bad idea, but it'd be good to find a "desktop storage agregator" or some such!.
Assuming a recent CPU with hardware virtualization, could you have one partition (say, 20GB) with Windows for the user, and another partition running something else to serve up the remaining hard drive space, with a hypervisor running them both at the same time, invisible to one another?
I thought this was the kind of scenario that virtualization was intended for.
Stasis is death. Embrace change.
Devote some disk space from each machine by creating empty files of the desired size (in W98 workstations, create many 1 GiB files). Write a simple Python (or whatever) that allows the devoted space to be treated as a remote device (elementary control requests and read, write operations in block sizes, whatever that block size may be).
Consolidate on a Linux machine, write a little more code using LUFS to connect to these remote devices and build soft raid devices (md-raid, evms, whatever). Make sure that you distribute your subvolumes as evenly as possible among the available remote devices. Create a filesystem on the mega-device. Share (NFS, SMB).
I speak England very best
My advice?
Don't.
40 gigs free, 100 desktops == 4 terabytes. That's roughly a grand now for a homebrew system, adding in the drive controller or surrounding boxen. DIY, on your own hardware, write it up and we'll all link and blog and rave about the cool hack. Well, we'll be less impressed now that drivespace is at $200 a T, but we'll at least nod approvingly when we see it on your resume.
Outside data introduces risks. Inside data has risks like HR or payroll or company secrets disclosure. Network and power utilization go up expensively. Someone will demand data recovery beyond your ability to provide it. Someone else will complain that your data got corrupted when some end user turned off a system midwrite.
Put another way, imagine trying to make a business case for this to the CEO of your company. If that doesn't turn off your urge to do this, start your way up the food chain until you get every superior's approval or adequately shot down to drop this fool's errand.
Did I mention you can get fired for this?
I remember about 5-8 years ago a product that I thought had "orange" in its name (or maybe an orange logo) that used the concept of spreading out one's documents across PC hard drives on the LAN for the purpose of backups/fault tolerance. Does anyone remember what that was called or what happened to it? Basically it would break up the file into chunks and store them on different clients, with encryption, etc. I'm not sure if it ever made it to v1.0.
It might have been a "joke", but it's not funny. Therefore, mod grandparent down.
Once again, the lowly system admins must come crawling to the developers for a solution to their problems. Learn to write code and roll your own.
DFS - Distributed File System. Just create a share with each of these and POOL IT with a DFS system
It sounds useful, but really it's more trouble than it is worth to treat separate computer drives as one volume, in a large office. DFS would be useful for a network of computers in one place isolated from other meddling people, like a rack of servers.
Notwithstanding the low price of just buying a terabyte disk should you need the space, trying to make a hundred computers serve up a lot of bitty disk space is really silly in terms of the cost-benefit. The first problem is someone power cycling their computer or disconnecting it. Then there's the slow access over a long cable. Already that's cause for lots of running around making sure the computers are up. Then there's the problem of someone just moving a computer somewhere else, and you could spend days trying to find out what happened, especially if that somewhere else happened to be out-of-town, or just into storage when an upgrade is obtained. Further keep in mind that the small drives are getting to be old drives and liable to zonk out.
So, if your environment is lots of small files, just share the drives to use for redundant backups of files that are small yet important, and don't lose sleep looking for data in a suddenly unavailable drive. Also encrypt because you may well lose track of where the file went or what file exists. If you need to handle larger files and need more space surely the company will be able to afford larger capacity drives, as they are already using so many computers.
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
A big problem is that so many people write things and say things because somebody else did, and they think it's cool. Unfortunately, these phrases or expressions quickly become overused, and that makes them annoying and ultimately meaningless.
For example, how many times have you read people misusing the phrase "per se"? Often it is written as "per say". But what does it actually mean? Most people don't really know. Then there is the New English version, "in and of itself". What, is "by itself" not good enough? Why does it have to be "in and of" itself?
"Period" is not a sentence. Don't spell out the punctuation in your sentence. It is redundant and silly. English has plenty of words that can be used to emphasise something. Why not try using one? There is no need to spell out your punctuation, comma exclamation mark! See? It's silly.
Similarly, when speaking English, it is silly to waggle your fingers around to indicate quote marks. There are words available which can express the desired intention with no need for hand movements. And saying "quote unquote" before quoting somebody is absurd. Surely it'd be better to simply say "quote" before the quoted material, and possibly "unquote" afterwards, if necessary.
Google could open source GFS
This discussion is nothing new and was discussed back in the 1980's and the conclusion was "if you don't like wasted disk space then get a centralised server and thin clients" and it equally applies today. If the business requires some of it's workforce to have laptops then connecting them into a distributed file-system is not worth the trouble because of their portability.
There ain't no such thing as proprietary standards only proprietary formats. Standards are by definition open.
The idea of PORN, or something akin to it.
Best dirsrubuted file system? None! Use Bittorrent, and fill the drives with backups. Do a fit match up... start from large to small. When something gets nocked off, you have just to get a torrent running, and in a few minutes Volia! ( but it will cause a bit of slow down, with all the traffic suddenly. )
Just be sure to close ALL torrent traffic to the outside, or someone else will 'share' your files... try looing for win.ini on the internet, you will get a huge suprise...
be sure also to AUDIT YOUR ENTIRE NETWORK FOR ACCESS RIGHTS, and all the most common mistakes...sharing 'C' and Administrator account...
Dude, relax.
that when a disk starts to reach it's capacity the performance degradation impact on the operating system can be considerable. This because the seek times are increasing and the fragmentation is also increasing.
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
input -> process -> output
everything got a virtual platform
and
move things dynamically
got a big mainframe? got a big storage?
then get ur workstations 'image'-wise and diskless
just give dumbs dumb terms
An application that springs to mind is backing up 'the bits that don't get backed up' possibly as an image. (Ok, lots of issues here, but I'm only thinking aloud.) You might set up a couple of PCs that fetched the whole of the company web site if their copy was more than two weeks old, or another three that stole files from the database server hourly, daily and weekly.
... that was unintentionally hilarious... 'porn!' was a default expected response but then you got all defensive about it. It really makes it look like you've got something to hide :D
plz torrent kthxbai.
Amen, brotha!
:p
Now, why the heck did you post as AC? You know, the uh, cool kids judge you by your cojones revealed by posting under a username.
-b
myselfmusic
... and let Freenet run on it: www.freenetproject.org
"In computer science, Freenet is a decentralized, censorship-resistant distributed data store originally designed by Ian Clarke. Freenet aims to provide freedom of speech through a peer-to-peer network with strong protection of anonymity. Freenet works by pooling the contributed bandwidth and storage space of member computers to allow users to anonymously publish or retrieve various kinds of information."
Theft of a machine could pose a security problem, if the data is not fragmented enough. What if only some computers are switched on when the data is needed? It would need to be massivel redundant to work properly.
Try something else: sharing the network's RAM; that would make a significant performance difference.
Let's get this clear - are you implying that there's no difference between stupidity and human nature (on average).
How cynical.
How realistic.
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
Backup daily PC1 to PC2, PC2 to PC3, and so on...
Use something like rsync.
Encrypt backups.
For additional redudancy, you can use a scheme like backing up PC1 to PC3 to PC4, PC2 to PC4 to PC5 and so on.
This does not give you the virtual file system that you were hoping for, but at least it puts to partial use the unused space.
Dude, chill.
He was joking about what you can use the disk space for, not what it is currently used for.
SANMedlody at www.datacore.com
And, how much power do these boxes use over a year, and how does this relate to the cost of a new hard drive?
It seems wasteful in terms of time as well as infrastructure. I do understand the desire to utilize unused resources...
Some other people have mentioned it, but I'd just like to make it clear. MSDFS sucks at life. First, I believe only windows server can participate as a server, so there goes the desktop idea. Even with windows servers its quite inflexible, quirky, and unstable.
I'm a systems analyst for a major retail firm. Currently, I'm managing around 500 machines that support all the various applications we implement (Mainly AIX, Linux, Solaris and about 80-90 Windows 2003 Server boxes). Short answer- look into EMC storage using NAS.
Also- migrating your physical servers to VMware instances will enable you to effectively allocate storage in the form of virtual disks. The idea is to keep your storage in disk-arrays, adding space where you need it as you need it. This eliminates the inefficiencies of multiple physical disks that are allocated to a specific machine.
Side note: one advantage of "utilize" is that it inflects decently. It's still deficient, but not as bad as "use".
Wow thanks for the lecture, it's a shame that you don't know about the language to pull it off properly. I guess the fact, than unlike you, I write for living has taught me a thing or two about the language that we use.
There is no distinct shade of meaning: utilization is a *synonym* of usage. Do you know what a synonym is? Just so that you are aware - not one person has suggested blindly replacing "utilize" by "use". But the usage of the word utilization is completely unnecessary, it does not have a distinct meaning that is missing from the word usage.
It is used by pompous asshats who think it makes them sound more knowledgeable, when in fact it merely signals them out to the crowd they wish they were in. Do you know where you stand now? Or has my utilization of the language been overly taxing for your tiny brain?
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
"DFS doesn't actually allow you to pool your disparate storage. It acts as a generic namespace that allows you to have multiple replicas of the same data, and keep your users from actually knowing where the stuff is kept."
Oh! so instead of a new revolutionary concept coming from Redmond's factories is just a reinvention of AFS which has had a long running open source variant named OpenAFS, isn't it? (so good for the troll few messages above).
"trying to make a hundred computers serve up a lot of bitty disk space is really silly in terms of the cost-benefit."
Only if there's no "prepacked solution" since the disk space is *already* payed for. The problem is not "I'm going to buy a lot of desktops with 80GB disks while local clients only will use 10GB so I can use the spare space" but "I can't get desktops with less than 80GB despite the fact they'll only use about 10GB, is there any way I can leverage my already payed for infrastructures?". It might be no proper solution, but the question seems quite legitime nevertheless.
"The first problem is someone power cycling their computer or disconnecting it."
Problem solved some decades ago: the solution is named "RAID".
"Then there's the slow access over a long cable."
Unless his LAN uses some kind of ADSL for local transit, it can't be a problem, since the bandwith usage for "using a lot of resources from my desktop that are on a central server" is just the same than "offer a lot of resources from my desktop to a central server".
"Already that's cause for lots of running around making sure the computers are up"
Only if there's no RAID over the solution; only if you try to use that spare space as "live speedy mass storage". What if that space is used for "near-line" backup storage or old data rarely needed? The service can even be coupled with some WoL environment so if the data is on a turned off system it will be automatically powered-on.
"Then there's the problem of someone just moving a computer somewhere else, and you could spend days trying to find out what happened"
Of course nobody says such a solution will be "as easy as turn on you recently bought computer with Windows Vista Home Edition". As any other "Enterprisy Solution" it will need to be properly engineered but, again, problems don't seem to be unsurmountable at first glance, let's see a simple operational flux:
1) Space-locators recieve a unique ID, so you can move computers and, as long as they are reacheable through the network, you can move them all you want.
2) The box cannot be connected; it might be turned off; WoL will try to start up it.
3) The box won't turn on even with the aid of WoL; well, no problem: data is RAIDed so it will be served out of checksums and the "lost" node will be replicated somewhere else.
See? All of this would be absolutly transparent to the sysadmin as long as the solution is properly engineered.
"Further keep in mind that the small drives are getting to be old drives and liable to zonk out."
That's again an already assumed problem. For one, "server" drives will be old, small and flakey in the near future too, so you better already have a plan for this situation; for two, ask Google or any other corporation with massive volumes of data in a "grid": even if they were "server quality" once you got a lot of disks you have to plan -and engineer, not for the case a drive dies but for an environment where a significative percentage of disks are continously off-line (due to breakeages, power out or whatever).
"Assuming a recent CPU with hardware virtualization, could you have one partition (say, 20GB) with Windows for the user, and another partition running something else to serve up the remaining hard drive space, with a hypervisor running them both at the same time, invisible to one another?"
So in order to take advantage of some gigs of data that costs peanuts you are going to sacrify expensive RAM and CPU cycles? That indeed would be "Not So Intelligent (TM)".
That's exactly what our software does. Check it out at http://www.revstor.com./
RevStor, LLC - The revolution in storage http://www.revstor.com
Does your average office drone really need the whole power of a modern processor to bang out documents in Word? The most basic computer you can get from Dell or Lenovo or some other OEM has lots of RAM and CPU cycles to spare.
Stasis is death. Embrace change.
"Does your average office drone really need the whole power of a modern processor to bang out documents in Word?"
Yes. Of course they won't need the CPU on a 100% basis, but they certainly need from time to time, like when the open said word documents.
" The most basic computer you can get from Dell or Lenovo or some other OEM has lots of RAM and CPU cycles to spare."
I don't think so. Specially RAM; office computers are not usually on spare of it.
I do some work for Allmydata, which an online storage provider. Their next-gen storage technology is open source and nearly perfect for this application. It's a bit green at this point, but coming along nicely. http://www.allmydata.org/
You have violated Robot's Rules of Order and will be asked to leave the future immediately.
This company caps email at 120 Meg, and whenever I get a warning that I'm reaching my quota I end up searching email by attachment size. People have bad habits of putting spreadsheets inside word docs, and all sorts of neat ways to clog things up. I just like to keep old email. I've only been there 6 months and have had to clean out the inbox a few times. Other places I've worked have automatically deleted old mail after 60 days and left it up to the users to "archive" it in a place that requires extra effort to get at. This is really frustrating given all the free space I have locally (as you pointed out).
Sorry for being an ass, but venting on rare occasions is good for the soul. Better to direct it into slashdot than people in the real world
The proper tool, IMHO, for quickly stopping a charging HDD is a .458 Winchester Magnum, with 450-grain soft point bullets. Great for putting 3/4 - 1" holes through all platters, with the side benefit of sending the drive a further 10 yards downrange, just in case a follow-up shot is required. Of course, I've never had to fire twice. One round seems to do the trick.
I totally understand. I realize that the IT-people-are-power-drunk-jerks stereotype didn't appear out of thin air. I try my hardest to surprise my fellow employees by actually being nice, helpful, and non-autocratic. As for attaching spreadsheets and presentations to emails and then forwarding them to fifty people instead of copying them to a public directory, I'm completely with you.
I'd have to disagree with you on the RAM bit. A friend of mine recently got 10 systems from Lenovo for the office, and each one had 2x512MB RAM, which I'd consider heavy overkill for the kind of workload they used. Perhaps that's an unusally large amount of RAM for a system to come with.
Besides, even a pretty close system would probably be able to spare the 32MB or so the virtualized SAN-node OS would need. It would probably eat up less than the onboard graphics will on those systems. We're not talking about running two full desktops at once.
Stasis is death. Embrace change.
What about a data grid?
XML is like violence. If it doesn't solve the problem, use more. Junta
I've merged free disk space on different windows PCs into one big samba share.
...
Steps:
On each windows PC:
1) Create an ad-hoc account on each PC
2) Set up a shared folder on each PC with full control by the previous account
3) Create a 2GB file (or more) in each folder, with a serial number in the filename (I've written a small executable for this)
On a linux box:
4) Mount the different windows shares
5) Set up a loopback device for each big file in the shares
6) Set up one device using all this space with device mapper. You can have the equivalent of RAID0, RAID1,
7) Make a FS on the device
8) Mount it
9) Set up a share
Et voilà!
I have two scripts to "mount" and "umount" such a device. The "mount" script does the 4-5-6-8 steps and the "umount" the opposite in reverse order.
If a windows PC needs to quit the group, put the big file on another windows member, that's easy.
Warning: it was so slow to make a ext2/ext3 (I don't remember which one I choose) that I've made a FAT32 file system, it was much faster to create (I still wonder why).
Of course I have very low performances with such a setup but a least I have a *lot* of space in one device.
"The distributed database could be constantly updated from the original sources, and the distributed storage then becomes in effect a giant cache that contains the entire internet."
As most people here know, possibly 80% of the internet is secured by passwords or some other mechanism. Google, Yahoosoft, or whoever only provide a small insight to what is out there. Archiving what mcdonalds.com chooses to present isn't that valuable.
"Now we could employ the distributed computing software to datamine that cache and we could have searching independent of Google or Yahoo or M$FT."
Effective datamining is more complicated than GREP. Google,Yahoo, and even M$FT have a lot of PhDs and tech gurus that have been working on optimizing search for 10+ years using servers dedicated for that effort.
Perhaps a more realistic and practical use of that space would be to redundantly backup his company's servers. He could make encrypted bite sized backups and park them on unused blocks of the desktops available. This still wouldn't be easy to implement but is theoretically doable.
Ninjas don't carry tic tacs
Next we'll be talking about harnessing unused RAM on all those workstations.
Unless for some reason there are extra disks on these hosts, it's not worth the effort of trying to access/lock/manage security/etc for storage that you'd have to access across a network, especially when users could change/reboot hosts and the increase in nodes overexposes you to failures.
Just buy disks of a size/speed/cost that's appropriate to give them room for anything that needs to be installed locally and set up a SAN or NAS for saving their files, where you can manage bulk storage most effectively. If you're feeling that you can invest a lot of time in the name of storage efficiency, give users individual iSCSI virtual drives on a NAS.
Only his tendency toward a dazed stupor prevented him from screaming aloud.
Gary Larson is still alive, although turning in his grave is a distinct possibility.
Only his tendency toward a dazed stupor prevented him from screaming aloud.
Take a look at Allmydata Tahoe. I think it will do what you're looking to do. It also sounds robust as well. Hope this helps.
Scientia et Potentia
Distributed Internet Backup System (DIBS) is a free, cross-platform python-based solution.
http://web.mit.edu/~emin/www/source_code/dibs/index.html
What about using freenet, possibly in a closed miniature of the full network? Install a node on these machines with all this idle space and assign the lion's share of it as that node's datastore. It's got the advantage of effectively combining all available storage and keeping stored content encrypted.