Ask Slashdot: Best *nix Distro For a Dynamic File Server?
An anonymous reader (citing "silly workplace security policies") writes "I'm in charge of developing for my workplace a particular sort of 'dynamic' file server for handling scientific data. We have all the hardware in place, but can't figure out what *nix distro would work best. Can the great minds at Slashdot pool their resources and divine an answer? Some background: We have sensor units scattered across a couple square miles of undeveloped land, which each collect ~500 gigs of data per 24h. When these drives come back from the field each day, they'll be plugged into a server featuring a dozen removable drive sleds. We need to present the contents of these drives as one unified tree (shared out via Samba), and the best way to go about that appears to be a unioning file system. There's also requirement that the server has to boot in 30 seconds or less off a mechanical hard drive. We've been looking around, but are having trouble finding info for this seemingly simple situation. Can we get FreeNAS to do this? Do we try Greyhole? Is there a distro that can run unionfs/aufs/mhddfs out-of-the-box without messing with manual recompiling? Why is documentation for *nix always so bad?""
Why do you need a unified filesystem. Can't you just share /myShareOnTheServer and then mount each disk to a subfolder in /myShareOnTheServer (such as /myShareOnTheServer/disk1).
I know I’m not going to be the first person to ask this, but if I understand it the plan here was:
1 - buy lots of hardware and install
2 - think about what kind of software it will run and how it will be used
I think you got your methodology swapped around man!
Why is documentation for *nix always so bad?
You are looking for information that your average user won’t care about. Things like boot time don’t get documented because your average user isn’t going to have some arbitrary requirement to have their _file server_ boot in 30 seconds. That’s a very weird use case. Normally you reboot a file server infrequently (unless you want to be swapping disks out constantly..). I’m assuming this requirement is because you plan on doing a full shutdown to insert your drives... in which case you really should be looking into hotswap
Also mandatory: you sound horribly underqualified for the job you are doing. Fess up before you waste even more (I assume grant) money and bring in someone that knows what the hell they are doing.
Use OpenAFS with Samba's modules. Distribution doesn't matter.
Why does it have to be a mechanical hard drive? Why not use an SSD for the boot drive?
Really, singular hard drives are notoriously bad at keeping data around for long. I would make sure you have a copy of everything. So make a file server with RAIDZ2 or RAID6 and script the copying of these hard drives onto a system that has redundancy and is backed up as well.
How many times I have seen scientist come out with their 500GB portable hard drives and they are unreadable... way too much. If you fill 500GB in 24 hours, there is no way a portable hard drive will survive for longer than about a year. Most of our drives (500GB 2.5" portable drives) last a few months, once they have processed about 6TB of data full-time they are pretty much guaranteed to fail.
Custom electronics and digital signage for your business: www.evcircuits.com
CentOS may be your best bet. Its Red Hat Enterprise Linux rebuilt from the Red Hat source code, minus the Red Hat trademark.
unRaid FTW, I use this to handle TB's of data and it works fine.
Another "I don't know how to do my job, but will slag off OSS knowing someone will tell me what to do. Then I can claim to be l337 at work by pretending to know how to do my job".
It's call reverse physiology, don't fall for it! Maybe shitdot will go back to its roots if no one comments in junk like this and the slashvertisments?
Not sure why the 30s boot up requirement is there, so it depends on what you define as "booted" . Spinning up 12 hard drives and making them available through Samba within 30s guarantees your costs will be 10x more than they need to be.
This isn't another example of my tax dollars at work is it?
Why would you want a file server to boot in 30 secs or less? Ok, lets skip the fs check, the controller checks, the driver checks, hell lets skip everything and boot to a recovery bash shell. Why would you not network these collection devices if they are all within a couple of miles and dump to an always on server?
I really fail to see the advantage of a file server booting in under 30 seconds. Shouldn't you be able to hot swap drives?
This really sounds like a bunch of kids trying to play server admin. My apologies if this is not the case, but given the parameters provided this IS what it sounds like.
There's no reason you need a union filesystem. Just mount the data at an appropriate point in a directory tree. Union file systems are designed to solve a different problem.
What you boot from has nothing to do w/ what you read the data from.
Samba is a really strange choice. Given the data volume I'd expect you to be using a large Linux cluster to process the data for which NFS would be more appropriate. It certainly sounds like microseismic data in which case the processing will benefit from making duplicate copies of the data and mounting read only via NFS so the first available server provides the data. Multiple ethernets are needed to get full benefit from doing that though.
*nix documentation is actually very good. But there is a lot of it, so you tend to have grey hair by the time you've read all of it.
BTW Does the CEO play guitar? I play harmonica.
Check out nas4free. It's basically freenas based on newer FreeBSD 9 which has zfs v28. I have been running it in a heavily used production system for 2 months with zero issues. I have 3 raidz2 setups that are shared out via NFs, cifs and afp. This setup is snapshotted every hour and also replicated via zfs send to offsides dr location.
If you go this route invest in a ssd for Zil and one for l2arc if you decide to dedup.
You know, I was on the "it doesn't matter" camp untill I readed your post. Now I just changed my mind.
Yes, any distro will do it. You'll have the same (lack of) trouble configuring the service on any distro. So, choose a distro that is easy to get into bare bones and to upgrade, because those are the two main differentiators here.
I sugest Slackware. Probably somebody else knows about somethig simpler, but not so simple that it will end up giving you more work.
Rethinking email
Any Linux distribution will boot in less than 30 seconds if [..]
Linux does. Too bad it takes the bios and raid array of a server up to minutes to do their checks...
.sig: No such file or directory
Booting in under 30 seconds is going to be a bit of a trick for anything servery. Even just putzing around in the BIOS can eat up most of that time(potentially some minutes if there is a lot of memory being self-tested, or if the system has a bunch of hairy option ROMs, as the SCSI/SAS/RAID-generally disk controllers commonly found in servers generally do...) If you really want fast, you just need to suck it up and get hot-swappable storage: even SATA supports that(well, some chipsets do, your mileage may vary, talk to your vendor and kidnap the vendor's children to ensure you get a straight answer, no warranty express or implied, etc.) and SAS damn well better, and supports SATA drives. That way, it doesn't matter how long the server takes to boot, you can just swap the disks in and either leave it running or set the BIOS wakeup schedule to have it start booting ten minutes before you expect to need it.
Slightly classier would be using /dev/disk/by-label or by-UUID to assign a unique mountpoint for every drive sled that might come in from the field(ie. allowing you to easily tell which field unit the drive came from).
If the files from each site are assured to have unique names, you could present them in a single mount directory with unionFS; but you probably don't want to find out what happens if every site spits out identically named FOO.log files, and(unless there is a painfully crippled tool somewhere else in the chain) having a directory per mountpoint shouldn't be terribly serious business.
But he either likes you, or is setting you up. build one of these instead.: http://hardware.slashdot.org/story/11/07/21/143236/build-your-own-135tb-raid6-storage-pod-for-7384 It's already been talked about.
I know you already stated the hardware is already in place. This is about exercising your new found authority. Go big or go home.
500G in a 24h period sounds like it will be highly compressible data. I would recommend FreeBSD or Ubuntu with ZFS Native Stable installed. ZFS will allow you to create a very nice tree with each folder set to a custom compression level if necessary. (Don't use dedup) You can put one SSD in as a cache drive to accelerate the shared folders speed. I imagine there would be an issue with restoring the data to magnetic while people are trying to read off the SMB share. An SSD cache or SSD ZIL drive for ZFS can help a lot with that.
Some nagging questions though.
How long are you intending on storing this data? How many sensors are collecting data? Because even with 12 drive bay slots, assuming cheap SATA of 3TB a piece. (36TB total storage with no redundancy), lets say 5 sensors, thats 2.5TB a day data collection, and assuming good compression of 3x, 833GB a day. You will fill up that storage in just 43 days.
I think this project needs to be re-thought. Either you need a much bigger storage array, or data needs to be discarded very quickly. If the data will be discarded quickly, then you really need to think about more disk arrays so you can use ZFS to partition the data in such a way that each SMB share can be on its own set of drives so as to not head thrash and interfere with someone else who is "discarding" or reading data.
You are in over your head. Buy a pogoplug and some usb2 hubs. Connect your drives to the hubs and they appear as unified file system on your clients. Or, if you need better performance accessing the data, call an expert.
Unless you're talking about millions of individual files on each drive it should be relatively quick to mount each hard drive and set up symbolic links in one shared directory to the files on each of the mounted drives. Just make sure Samba has "follow symlinks" set to yes and the Windows clients will see just see normal files in the shared directory.
this whole project is a joke, either:
A) you don't know what your looking for(reasonable but silly)
or B) you don't know how to collect data of what your looking for (pointless).
Process your sensor data at the sensor. There is no reason that anyone needs to take more than 10-50 mb of data per sensor device per session.
If it's a signal detection, use a smart filter to capture the event, or a FFT to capture the frequencies.
If it's a measurement, use a buffered slope detection and only capture the change.
If you do need to move 500gb/per day/per sensor. Just install fiber to the sensors and stream it back to a localized collection server. /.).
This saves countless sneaker net headach's, compression issues and the sort. For the collection server Buy It!, there are great products out there that can
take your 500gb of poorly compressed sensor data and make it 500mb of indexed intelligence (totally avoiding the obvious buzz word use here sorry
Aka: I do not want the insecurity of losing my workplace if my boss happens to learn in slashdot how clueless I am.
Seriously... could you send us the resumé that you sent to get that job?
Why can't
Why is documentation for *nix always so bad?""
For starters, I'm really tired of this /. *NIX is-too-hard ranting all the time on 'Ask Slashdot' posts. Don't be a n00b douche; if you don't get it, then spend some time and get it. Don't blame the documentation; dig in and figure out something for yourself for once. Sometimes you Nintendo-and-Mt-Dew generation make me want to throw up.
As for your solution, do-not go with some installable appliance-type distro like FreeNAS; yes it's *BSD under the hood, but you're at the mercy of what that 'focused' distro is goign to provide for you. Case in point: since you're undecided, go with a full-blown distro so you have some flexibility to grow and augment the mission and purpose of this server you're hosting data on.
Since you're clearly a n00b since it's coming to picking out a *NIX solution, go with anything Linux at this point, and set up the NAS services yourself (e.g. Samba/SMB, NFS, etc.) In turn, you'll be able to go to get better community support helping you out, you'll have more flexible OS configuration and growth, and you'll probably learn something to boot.
Also, you don't need to do union filesystem. Simple udev rules and auto mounting them under your top-level structure you're sharing out with your NAS services will do you just fine.
What is the point of 30 second boot on a file server? If this is on the list of 'requirements', then the 'plan' is 1/4 baked. 1/2 baked for buying hardware without a plan, then 1/2 again for not having a clue.
unioning filesystem? what is the use scenario? how about automounting the drives on hot-plug and sharing the /mnt directory?
Now, 500GB/day in 12 drive sleds....so 6TB a day? do the workers get a fresh drive each day or is the data only available for a few hours before it gets sent back out or are they rotated? I suspect that mounting these drives for sharing really isnt what is necessary, more like pull contents to 'local' storage. Then, why talk about unioning at all, just put the contents of each drive in a separate folder.
Is the data 100% new each day? Are you really storing 6TB a day from a sensor network? 120TB+ a month?
Are you really transporting 500GB of data by hand to local storage and expecting the disks to last? reading or writing 500GB isn't a problem, but constant power cycling and then physically moving/shaking the drives around each day to transport is going to put the MTBF of these drives in months not years.
dumb
Saying "only good mp3 player" makes no sense unless you specify your criteria. Amarok, Banshee, VLC, Rhythmbox, or smplayer are all capable mp3 players by various criteria and easily found by googling for "linux mp3 player". If you use Ubuntu, searching for mp3 player in Software Center finds a plethora of good players. Googling "list of linux audio software" easily finds other things besides just mp3 players: maybe something like Audacity satisfies your requirements better. Search for "mp3" on xmms2.org finds the answer in the first link - your xmms2 install needs have the MAD library, maybe your distro does not install that.
Does not seem like the problem is with bad docs.
"When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
Use systems of symbolic links.
Also, why "30 seconds boot time"? This strikes me as a bizarre and unnecessary requirement. Are you sure you have done careful requirements analysis and engineering?
As to the "bad" documentation: Many things are similar or the same on many *nix systems. This is not Windows where MS feels the need to change everything every few years. On *nix you are reasonably expected to know.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
"When these drives come back from the field each day, they'll be plugged into a server featuring a dozen removable drive sleds."
"There's also requirement that the server has to boot in 30 seconds or less off a mechanical hard drive."
So... it takes minutes to hours to get your drives to the server, but then suddenly it's an emergency to get the server booted. That makes no sense to me. Please explain.
Because I suck at it and I'm too lazy to learn how to do it myself.
Ok, lots of folks asking similar questions. In order to keep the submission word count down I left out a lot of info. I *thought* most of it would be obvious, but I guess not.
Notes, in no particular order:
- The server was sourced from a now-defunct project with similar setup. It's a custom box with non-normal design. We don't have authorization to buy more hardware. That's not a big deal because what we have already *should* be perfectly fine.
- People keep harping on the 30 seconds thing.
The system is already configured to spin up all the drives simultaneously (yes the PSU can handle that) and get through the bios all in a few seconds. I *know* you can configure most any distro to be fast, the question is how much fuss it takes to get it that way. Honestly I threw that in there as an aside, not thinking this would blow up into some huge debate. All I'm looking for are pointers along the lines of "yeah distro FOO is bloated by default, but it's not as bad as it looks because you can just use the BAR utility to turn most of that off". We have a handful of systems running winXP and linux already that boot in under 30, this isn't a big deal.
- The drives in question have a nearly identical directory structure but with globally-unique file names. We want to merge the trees because it's easier for people to deal with than dozens of identical trees. There are plenty of packages that can do this, I'm looking for a distro where I can set it up with minimal fuss (ie: apt-get or equivalent, as opposed to manual code editing and recompiling).
- The share doesn't have to be samba, it just needs to be easily accessible from windows/macs without installing extra software on them.
- No, I'm not an idiot or derpy student. I'm a sysadmin with 20 years experience (I'm aware that doesn't necessarily prove anything). I'm leaving out a lot of detail because most of it is stupid office bureaucracy and politics I can't do anything about. I'm not one of those people who intentionally makes things more complicated than they need to be as some form of job security. I believe in doing things the "right" way so those who come after me have a chance at keeping the system running. I'm trying to stick to standards when possible, as opposed to creating a monster involving homegrown shell scripts.
OP rather clearly stated criteria for "good mp3 player". Here it is, since you missed it the first time: "sorts music like XMMS and since I'm used to XMMS does most everything in a similar fashion as well."
The first thing I thought of was loss of one of the drives during all this moving around. Seems the protection of the data would be of the utmost priority here. Keeping this in mind, I'd go with a RAID 5 or 10 setup. This will eliminate having the data distributed on different "drives", so to speak, and it would appear to the system one single drive. This would increase the drive count, but loosing a drive, ether physically (oops, dropped that one in the puddle) or electronically (oops, this drive crashed because we keep swapping it every day) would be a non-issue, or at least a not-tragic issue. I'm sure you have a swappable tray system now for the number of drives you need, you may need to add a tray or two for this setup. Just make sure you keep the drives in the correct order, or swap out the whole drive unit.
As for the original question. I don't think there's really a "best" distro for this, they'll all pretty much do the above out of the box and almost automagically. What you need to look for is what is the easiest distro to use in this case. What will the users be able to use with the least support from me? Unless you're the one that will be swapping out the drives on a daily basis, then use what you're most comfortable with.
--- Keep the choice with the user..
The nice thing about CentOS is that if/when you wind up on RHEL (comes with hardware, what you hosting provider is using, etc) the migration will be pretty simple.
First of all, xmms seems to be really outdated, and if his distro does not include it and he cannot compile it himself for whatever reason (despit copious info for how to install dev packages and compile for any distro), I fail to see how this is a failure of the un*x docs specifically. Current documentation for WinPlay3 is also rather scarce.
It's also hard to believe that there is not a single mp3 player out there that sorts music like xmms, whatever this is. He stated he tried Rhythmbox and was not too happy, but the allegedly so deficient un*x docs readily list many more, while the Ubuntu software center lets you try them out with one click.
Anyway, the logical conclusion seems to be to use xmms2, docs for which appear in the third google hit (for me) when searching for xmms. Which in turn, as I wrote, would take him to the solution in the first hit when searching for mp3 at xmms2.org. Again, how is this evidence for generally bad docs?
"When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
You have to be able to identify the disks being mounted. Since these are hot swappable, they will not be automatically identifiable.
Also note, not all disks spin up at the same speed. Disks made for desktops are not reliable either - though they tend to spin up faster. Server disks might take 5 seconds before they are failed. You also seem to have forgotten that even with all disks spun up, each must be read (one at a time) for them to be mounted.
Hot swap disks are not something automatically mounted unless they are known ahead of time - which means they have to have suitable identification.
UnionFS is not what you want. That isn't what it was designed for. Unionfs only has one drive that can be written to - the top one in the list. Operations on the other disks force it to copy it to the top disk for any modifications. Deletes don't happen to any but the top disk.
Some of what you discribe is called an HSM (hierarchical storage management), and requires a multi-level archive where some volumes may be on line, others off line, yet others in between. Boots are NOT fast, mostly due to the need to validate the archive first.
Back to the unreliability of things - if even one disk has a problem, your union filesystem will freeze - and not nicely either. The first access to a file that is inaccessable will cause a lock on the directory. That lock will lock all users out of that directory (they go into an infinite wait). Eventually, the locks accumulate to include the parent directory... which then locks all leaf directories under it. This propagates to the top level when the entire system freezes - along with all the clients. This freezing nature is one of the things that a HSM handles MUCH better. A detected media error causes the access to abort, and that releases the associated locks. If the union filesystem detects the error, then the entire filesystem goes down the tubes, not just one file on one disk.
Another problem is going to be processing the data - I/O rates are not good going through a union filesystem yet. Even though UnionFS is pretty good at it, expect the I/O rate to be 10% to 20% less than maximum. Now client I/O has to go through a network connection, so that may make it bearable. But trying to process multiple 300 GB data sets in one day is not likely to happen.
Another issue you have ignored is the original format of the data. You imply that the filesystem on the server will just "mount the disk" and use the filesystem as created/used by the sensor. This is not likely to happen - trying to do so invites multiple failures; it also means no users of the filesystem while it is getting mounted. You would do better to have a server disk farm that you copy the data to before processing. That way you get to handle the failures without affecting anyone that may be processing data, AND you don't have to stop everyone working just to reboot. You will also find that local copy rates will be more than double what the servers client systems can read anyway.
As others have mentioned, using gluster file system to accumulate the data allows multiple systems to contribute to the global, uniform, filesystem - but it does not allow for plugging in/out disks with predefined formats. It has a very high data throughput though (due to the distributed nature of the filesystem), and would allow many systems to be copying data into the filesystem without interference.
As for experience - I've managed filesystems with up to about 400TB in the past. Errors are NOT fun as they can take several days to recover from.
Also, "my criteria is for x to work like ancient app y" is not so workable. Sounds like Microsoft's convoluted standard's document for Office Open XML regarding backward compatibility. "You have to emulate bug x of Word 2, but we can't tell you exactly how that worked". Someone might have helped him if he had given specific requirements.
"When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
One of the things I have learned in my career is that I do not know everything. I do the things I can do well and understand the business and when I need something more, I will call in those that have the skills to help. If you get the right consultant involved with the project, they will bring the knowledge necessary to do the job right.
One of the biggest mistakes you can make is to think that you have to know everything and its some kind of fault to say, “I do not know right now but let me look into it and come back with an answer”. Do not limit yourself to only the knowledge and skills of the in house staff, tap into other sources to bring in new knowledge and skills that can help you solve problems.
My biggest resource was a list of those I could call in to help me with an issue or in the help in locating those that could come up with an answer. It not a fault to say I need to bring in some help on this project.
---
Excuse the grammar, been awake a few to many hours and not thinking to clear right now.
then get OpenBSD.
Well OP, you opened a big can of worms with this one. You may as well dump everything on the table. I see you started to in another post already. Why not add on answers to the other questions like, what kind of "sensors" are producing 500GB per day? What happens on undeveloped land that could produce so much data?
My first guess (considering undeveloped land) is weather plus solar radiation and soil temperature/moisture and perhaps seismic data. But that couldn't take more that 5MB per day. So what produces 500GB per day?
If you want different removeable disks to be mounted in different places, it's even easier. Just list each disk (identified by UUID) in /etc/fstab, with the proper mountpoint and include auto in the options. That way, when you plug it in, the system knows exactly where it goes.
Good, inexpensive web hosting
What you're describing sounds like a fairly typical Sensor Net (or Sensor Web) to me, maybe with a little more data logged than is normal per platform. (I believe they call it a 'mote' in that community).
Some of the newer sensor nets use a forwarding mesh wireless system, so that you relay the data to a highly reduced number of collection points -- which might keep you from having to deal with the collection of the hard drives each night (maybe swap out a multi-TB RAID at each collection point each night instead).
I'm not 100% sure of what the correct forum is for discussion of sensor/platform design. I know they have presentations in the ESSI (Earth and Space Science Informatics) focus group of the AGU (American Geophysical Union). Many of the members of ESIPfed (Federation of Earth Science Information Partners) probably have experience in these issues, but it's more about discussing managing the data after it comes out of the field.
On the off chance that someone's already written software to do 90% of what you're looking for, I'd try contacting the folks from the Software Reuse Working Group of the Earth Science Data System community.
You might also try looking through past projects funded through NASA AISR (Adanced Information Systems Research) ... they funded better sensor design & data distribution systems. (unfortunately, they haven't been funded for a few years ... and I'm having problems accessing their website right now). Or I might be confusing it with the similar AIST (Adanced Information Systems Technology), which tends more towards hardware vs. software. ... so, my point is -- don't roll your own. Talk to other people who have done similar stuff, and build on their work, otherwise you're liable to make all of the same mistakes, and waste a whole lot of time. And in general (at least ESSI / ESIP-wide), we're a pretty sharing community ... we don't want anyone out there wasting their time doing the stupid little piddly stuff when they could actually be collecting data or doing science.
(and if you haven't guessed already ... I'm an AGU/ESSI member, and I think I'm an honorary ESIP member (as I'm in the space sciences, not earth science) ... at least they put up with me on their mailing lists)
Build it, and they will come^Hplain.
Actually AUFS requires kernel patches, it's never been mainlined because the kernel maintainers like their own union mounts better ... even though they are far less useful (not write-through like AUFS, which is really nice for something like a file server) and forever undelivered. That said, I think Ubuntu and SUSE still come with AUFS patched in ... for Debian you have to compile your own patched kernel or use something like the Liquorix kernel.
Since you already bought the hardware, odds are you're going to run into driver issues. Since you're not already a *nix guy my suggestion is just run windows on your server. Next buy big fat USB enclosures, the kind that can hold DVD drives and put the drive sleds in there. Now you don't have to reboot adding the drives.
Cwm, fjord-bank glyphs vext quiz
Assuming "500 gigs per 24 hours" is "500GiB per 24 hours" then we are talking roughly 6MiB per second. Are we talking video of some kind here? If not, I'll bet the data can compress down really well...
Maybe you should think about streaming data directly to tape, and then swap tapes out. Assuming for a second that your 30 second boot time has something to do with cloning a drive or some such nonsense, using tape will allow you to bring the data to the server instead of the other way around. At this point you will get rid of the silly 30 second boot time requirement while also drastically increasing the life of your fileserver drives.
As for the rest of your requirements, learn Linux or hire someone that knows it. Any distro can be modified to boot faster or slower since it's all pretty much the same anyway. Redhat compiles bash/perl/... from the same upstream sources that Debian does. Buying hardware that will allow you to boot faster is the real trick here. You might try to hire someone that knows what coreboot is and what it does. That would be a good indication that they are aware of factors responsible for getting a system up quickly. I would shamelessly offer my skills for hire but given the description of the problem, I'll stay out of the hiring pool.
Pretty much any unix like OS will do fine. Personally I use NetBSD and if linux is a requirement, Debian.
http://michaelsmith.id.au
simple http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)
Talk to your local University.
Upper-division- and possibly graduate-level students in the following disciplines may be very interested in how this project turns out:
* Computer science / information systems / related
* The particular science related to the data being collected (e.g. meteorology if this is weather data)
* If this is a government project, discplines related to urban planning, politics, and the like
* If it's a business project, students taking courses related to doing tasks using available resources / tight budgets or in organizations that impose hard, non-technical, not-necessarily-wise requirements.
No matter how you ACTUALLY proceed, letting professors and key students from these disciplines watch you and document what you do then turn their results into "what might you do differently if you were in this same situation, and why?" type coursework will be invaluable to future generations.
And, who knows, you may get some valuable ideas you can actually implement.
--
By the way, I assume you have already considered and for the time being rejected the default option: Not go forward with this project on the grounds that doing nothing is a better option than attempting to do it under the constraints that you are being forced to work under.
Please keep this "default option of canceling the project" in the back of your head, and don't be afraid to recommend it should it really become the best option.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
The best explanation I've heard so far for why technical documentation in general (and in this case *nix documentation in particular) is often so poor is from a sci-fi TV series, called Eureka. In one episode, the characters search in vain for a manual to help shut down an antiquated launch system. When they figure there never was a manual, one character asks another why the builders did not bother to write one. The reply he receives is "Well, what do you want: progress or poetry?"
s/t
fak3r.com
They are, actually (at least in Vista and later, and for just directories as far back as 2K), but that's irrelevant.
If you set Samba to follow symlinks, it will present it to any client applications as though it were the actual file. So even old DOS-based Windows systems can handle it.
Why is documentation for *nix always so bad?
Try reading Slashdot comment threads regarding the liberal arts, writing/English as a major, or even just people whose primary talent isn't in technology -- would you want to devote a significant chunk of time & energy volunteering in a community that felt that way about your field and everyone within it, even if the community is focused on an interest/hobby you have?
Now mostly at Usenet:comp.misc & SoylentNews.org (it's made of people!)
There's also requirement that the server has to boot in 30 seconds or less...
Uh, disable Pulse Audio??
Why say *nix? We're gonna use Linux anyway, right?
Which kind of worked fine (though it couldn't sort my mp3's in s sane order, which is why I wanted XMMS in the first place) until I got tired of music and wanted to stop. So, I clicked the X in the upper-right corner... the window dissapeared, but the music continued.
If you close the window, Rhythmbox stays in the tray. You have to explicitly quit it from the menu.
Great. Had to ps and kill, and now my beer supply is out of sync with my enthusiam for music.
Heh. I know the feeling!
The boot process is greatly sped up with SystemD. Even though there is a chain of dependancies in the boot process, SystemD manages to boot largely in parallel, still. It is compatible with SysV bootscripts, so no sweat here ;-)
Fedora is a nice distro, but not stable. However, if you boot directly to a prompt, even Ubuntu Servers boots in a few seconds of a mechanical drive.
Now on to the data sharing; take a look at FUSE (Filesystem In UserSpace). This allows for unifying multiple filesystems into on virtual filesystem.
You can PM me for further questions and for asking my e-mail, if you need more direct contact :-)
Here be signatures
The unit docs used to be excellent then the 90's came and the fired technical writers and told engineers to do it. Engineers as expected have piles of domain knowledge and that reflects in there documentation couple that with a general disdain for the mess that most languages are and you get something with a steep learning curve that has a tenancy to be out of date
No sir I dont like it.
nobody pays for it and nobody will take the time to explain the system to people who might write it.
To be blunt:
This should be obvious to anyone who gives it any thought at all. You might try thinking about and solving your own problems instead of just posting them to a web site somewhere
What format are these drives in? Are they flash drives formatted in FAT32... great plug them all into a powered USB hub and share the files... no, well... bummer.
Are they stand alone ZFS pools? Great, drop them into your ZFS SAN and mount the zpool and share away... no, well... bummer.
What file system are they presented in? Could be anything... if it's Plan9 9P then maybe we can say sure what the heck... anything else and you're going to have to be a bit more specific.
For point of reference, I have two SAN systems at work. One is very fast, 4TB and runs on OpenIndiana and uses ZFS for our database and email servers. It takes several minutes to bring all disks online and be fully functioning. These are flash disks and it has 128GB of ram. It's screaming fast but has lots and lots of small disks. It cost $24K and is made for crazy speeds (saturates two 10GBe links and handles 120k IOs read/write simultaneously no problem).
The big SAN is 40TB and boots in about one minute to a useful state and starts bringing disk online with 10 minutes. It cost $2.5 million and is about the size of a minivan. It's made for gigantic simultaneous IO and 5 nines of availability and has dozens of easily removable drives and is extremely tolerant of hot swapping.
Be more specific OP.
[RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.
- loop recursively through all the files in the hard drives
- symlink them to another folder with the same structure
- share that folder
lather, rinse, and repeat every time you add/remove a drive. Not the most efficient or fancy solution in the world, but if you now bash you can write that in 10 lines of code
dCache is probably what you want.
A lot of my biggest concerns have been addressed by others. A few things that I haven't seen covered:
The "30 second boot time" limit makes me assume that there is something time-sensitive about this data collection. (Otherwise, why would you be wasting time on it?) So, you need a fast boot, but then you're mucking around with Samba and union mounts, which are both relatively slow. This doesn't make any sense. This is why people are asking questions or making up odd scenarios in their answers.
The odd scenario that I'm assuming is that you have more drives than sleds, so you need to go through a few load-boot-read-shutdown-unload cycles to get all of your data OR the machine's being "borrowed" to read the data, so you need to bring it up with an alternate OS quickly so that you can work through the night before returning it to normal use in the morning.
If that's the case, it really sounds like (as someone else suggested) that you need to separate the collector from the persistent storage. Set up something that can read the data from all of your "dynamic" drives as fast as possible. Depending on the data, something like rsync or even netcat might be the fastest way to get data off of the machine.
--