Slashdot Mirror


Ask Slashdot: Best *nix Distro For a Dynamic File Server?

An anonymous reader (citing "silly workplace security policies") writes "I'm in charge of developing for my workplace a particular sort of 'dynamic' file server for handling scientific data. We have all the hardware in place, but can't figure out what *nix distro would work best. Can the great minds at Slashdot pool their resources and divine an answer? Some background: We have sensor units scattered across a couple square miles of undeveloped land, which each collect ~500 gigs of data per 24h. When these drives come back from the field each day, they'll be plugged into a server featuring a dozen removable drive sleds. We need to present the contents of these drives as one unified tree (shared out via Samba), and the best way to go about that appears to be a unioning file system. There's also requirement that the server has to boot in 30 seconds or less off a mechanical hard drive. We've been looking around, but are having trouble finding info for this seemingly simple situation. Can we get FreeNAS to do this? Do we try Greyhole? Is there a distro that can run unionfs/aufs/mhddfs out-of-the-box without messing with manual recompiling? Why is documentation for *nix always so bad?""

234 comments

  1. Do you need a unified filesystem at all? by TheSunborn · · Score: 2

    Why do you need a unified filesystem. Can't you just share /myShareOnTheServer and then mount each disk to a subfolder in /myShareOnTheServer (such as /myShareOnTheServer/disk1).

    1. Re:Do you need a unified filesystem at all? by TheSunborn · · Score: 1

      Seems someone ate the rest of my text. But if you do it this way, then the windows computers will se a single system with a folder for each harddisk. The only reason this might cause problems is if you really need files from different harddisks to appear as if they are in the same folder.
       

    2. Re:Do you need a unified filesystem at all? by Anrego · · Score: 4, Insightful

      I have to assume they are using some clunky windows analysis program or something that lacks the ability to accept multiple directories or something.

      Either way, the aufs (or whatever they use) bit seems to be the least of their worries. They bought an installed a bunch of gear and are just now looking into what to do with it, and they've decided they want it to boot in 30 seconds (protip: high end gear can take this long just doing it's self checks, which is a good thing! Fast booting and file server don't go well together).

      Probably a summer student or the office "tech guy" running things. They'd be better off bringing in someone qualified.

    3. Re:Do you need a unified filesystem at all? by Anonymous Coward · · Score: 4, Informative

      OP here:

      I left out a lot of information from the summary in order to keep the word count down. Each disk has an almost identical directory structure, and so we want to merge all the drives in such a way that when someone looks at "foo/bar/baz/" they see all the 'baz' files from all the disks in the same place. While the folders will have identical names the files will be globally unique, so there's no concern about namespace collisions at the bottom levels.

    4. Re:Do you need a unified filesystem at all? by Anonymous Coward · · Score: 1

      OP here:

      No, it's mainly for user convenience. People will be looking at the share manually and it's easy to lose track of what you're doing when you have a dozen folder views open with the same names.

      As for the gear, it was sourced from a defunct project with similar goals. The board was specifically bought for booting fast and is configured to get through the bios in under 5 seconds (you can disable a lot when you don't need raid and there's only one persistent drive). We have many WinXP and linux systems that go from cold-to-desktop in under 30 seconds, so this isn't really a big concern, I just threw it in there to weed out suggestions involving massive ubuntu installs that take 2+ minutes to load a bunch of default shit we don't need.

    5. Re:Do you need a unified filesystem at all? by Knuckles · · Score: 2

      Massive Ubuntu installs taking 2 minutes to boot? Whatever its faults, Ubuntu was the one distro most focused on boot time for a long while, and even a standard desktop install goes from BIOS hand-off to login screen in 10 - 12 secs with a standard HD.

      --
      "When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
    6. Re:Do you need a unified filesystem at all? by spire3661 · · Score: 3, Interesting

      After looking through your proposal you need 2 pieces. You need a WORKSTATION to accept the drives as well as cleanse( you are going to verify the data as non-malicious right?) and catalog the data and be able to shut down and boot up on command. Then you need a SERVER that hosts the data to be served. Thinking you are going to serve directly from the hotswaps is a bad idea.

      --
      Good-bye
    7. Re:Do you need a unified filesystem at all? by Anonymous Coward · · Score: 4, Informative

      FreeNAS is based on FreeBSD, and boot speed (no matter what the OS) is based entirely on the hard drive speed + CPU speed + 'automagic' configuration.

      FreeBSD boots pretty fast, but you need to turn off things like the bootloader menu delay, and set fixed IP addresses. Same on Linux, but Linux tends to be sloppy about starting up services.

      In either case you can usually just turn anything you don't need off, and just turn on what you do need.

      FreeBSD's ZFS is better than anything you can setup on Linux, but unless the box has a lot of RAM you're not going to get the expected performance.

      Most of the NAS devices you see for sale run FreeNAS if they're based on x86-64 CPU's or Linux if they're not (PPC/MIPS/ARM) but they're not particuarly great pieces of hardware, you pretty much end up with something stupid silly like:
      OS -> UFS/EXT2/EXT3 -> Samba share
      for Windows clients, but you can also do this on FreeBSD/FreeNAS (ZFS is terrible under Linux-FUSE)
      FREEBSD->ZFS (using all drives, even remote drives) -> iSCSI
      iSCSI is something that you must have GigE/10GB Fiber for, and decent processing power. Most of the systems you see (including DELL) that do iSCSI are woefully underpowered for a small server, or extremely overkill (enterprise)

      Windows however supports iSCSI out of the box. So you can do something theoretically stupid like this:
      FreeBSD -> ZFS ->iSCSI ->Windows box accesses iSCSI and shares it with other Windows machines.

      So it depends what you really want to do. From your description, it sounds like what you really want to do is hotplug a bunch of drives into a system, that system is "union"'d by filesystem mounts (nobody says you have to mount everything to root) and the share them under that samba.

      But another possibility, not clearly indicated is that maybe the drives have overlapping file systems that you want to see as one (eg same directory structure, different file names) this is more complicated to deal with, but I'd probably go with not trying to share off the hotswapped drives and instead RSYNC all the drives to another filesystem and share that instead.

    8. Re:Do you need a unified filesystem at all? by Anonymous Coward · · Score: 1

      Only morons run zfs under fuse (unless they specificially want a fuse fs) on linux now since it has a native implementation available to patch in.
      That implementation is good enough for supercomputers.

    9. Re:Do you need a unified filesystem at all? by Anonymous Coward · · Score: 0

      Thanks for clarifying your lack of knowledge. It's highly doubtful that any solution presented here will be something you can manage yourself. IOW, You should farm this project out.

    10. Re:Do you need a unified filesystem at all? by Anonymous Coward · · Score: 0

      You must have been using some other Ubuntu than the ones I've tried. They've booted decently, but even on a high end machine I've never gotten close to 15 seconds from cold.

    11. Re:Do you need a unified filesystem at all? by Knuckles · · Score: 1

      You must have been using some other Ubuntu than the ones I've tried. They've booted decently, but even on a high end machine I've never gotten close to 15 seconds from cold.

      All of my machines did, 2 not-so-hot ThinkPads and a MacBook Pro from 2009. 12.04 seems to have become a bit slower. Lots of people have posted bootcharts for you to see - note that I'm talking grub to login prompt.

      --
      "When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
    12. Re:Do you need a unified filesystem at all? by F.Ultra · · Score: 1

      And Ubuntu Server (since we most certainly is not talking about a desktop version here) boots quite fast regardless of version. Well below 15 seconds from the moment BIOS boots grub. Our Ubuntu servers boot in around 4-5 seconds.

    13. Re:Do you need a unified filesystem at all? by F.Ultra · · Score: 1

      You then probably should ditch that idea right away and instead use a distributed file system on all your servers and simply export that via samba.

    14. Re:Do you need a unified filesystem at all? by Knuckles · · Score: 1

      Sure, that's why I said "even a standard desktop install". I thought it was implied that an install with unneeded services disabled would be faster :)

      --
      "When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
    15. Re:Do you need a unified filesystem at all? by Anonymous Coward · · Score: 1

      FreeBSD also has a unionfs mount option, and ZFS can certainly read and write at 1gig wirespeed with 4gb of ram and a few off the shelf disks.
      You will want to understand AIO, sync writes and their effect on ZFS etc, but it's probably the most reliable solution as well.

      IIRC, CERN was using ZFS and were able to identify a flaky network card based on the ZFS block checksums.

      Linux just isn't as effective for storage or routing. It makes a great workstation, but for routing OpenBSD is king, and for storage FreeBSD is king.

    16. Re:Do you need a unified filesystem at all? by sg_oneill · · Score: 1

      Linux in general is a pretty fast booter, but its the dependencies that are the problem.

      A ubuntu server on its own is snappy as heck to boot, but once you load it up with a bunch of services each with its own dependencies for other services, no distribution is going to fix that.

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
    17. Re:Do you need a unified filesystem at all? by GPLHost-Thomas · · Score: 1

      FreeNAS is based on FreeBSD, and boot speed (no matter what the OS) is based entirely on the hard drive speed + CPU speed + 'automagic' configuration.

      This used to be truth, but it's not anymore. Look at modern init systems like upstart, OpenRC or Systemd. You'll see that these have very different results than what we could see using let's say sysv-rc. Note that there's a (huge) ongoing discussion inside Debian to choose what we will be using next (as Debian still uses the old sysv-rc thing).

    18. Re:Do you need a unified filesystem at all? by GPLHost-Thomas · · Score: 2

      Explain then why the majority of everyone is using Windows... (in other words: this is a silly question: who told you that the best tool for the job is always the one that is used?)

    19. Re:Do you need a unified filesystem at all? by Anonymous Coward · · Score: 0

      Debian GNU/Linux running a file system configured for logical volume management with the entire system mirrored across an identical set of disks housed within the same server. I use this type of configuration on my production server which hosts a number of concurrently running virtual machine instances each providing a specific service. A BASH script to copy the data feed hard disk content to the unified subdirectory structure on the server would take care of the merging of the data. And then SAMBA or NFS could be utilized to allow access to this data share from the numerous workstations.

    20. Re:Do you need a unified filesystem at all? by Knuckles · · Score: 1

      Well, no OS can do magic and provide stuff at no cost. Stuff like hardware detection needs to be done at some point and there is no way arround it - Windows boots pretty quickly, but at the price of taking ages to shut down. Ubuntu's switch to Upstart helped boot times by parallelizing dependencies.

      --
      "When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
    21. Re:Do you need a unified filesystem at all? by Hognoxious · · Score: 1

      Perhaps I'm missing something?

      For every file in directory_where_they_physically_are create a symlink (ln -s) in directory_where_they_should_appear_to_be with a target of the physical file..

      You might need to do some fiddling if there's duplicate names, but it's about a ten line bash script (or one in perl).

      Samba, at least on my machine, traverses links transparently. I don't recall doing any magic to make that happen.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    22. Re:Do you need a unified filesystem at all? by hardwarefreak · · Score: 1

      You must redesign your workflow. At this point you're attempting to re-engineer a flawed workflow system. Cut your losses and start over, doing this in a way known universally to work.

      The first place to start is to take a large blunt object and hit the idiot over the head who decided he needed 500GB/day per "sensor" of environmental data for "undeveloped land". Oil/gas company seismic surveys of potential oil fileds don't even capture this much data, and they survey hundreds of square miles at a pop, with multiple billions of dollars potentially on the line.

      Second, as others mentioned, do not try to mount and directly share the field node disks. Create a batch copy system and pull everything off the sensor node drives onto a RAID array on the server. This setup is still light years away from an optimal field data collection system. What you *should* do is:

      Build a centralized field "office". I.e. a cheap plywood building relatively weather proof. Acquire a ruggedized rack server vertified for outdoor field use and a half rack cabinet on wheels. Dozens of companies sell such gear. Install a wireless router that can accept an external high gain antenna. Build a rigid square box antenna mount with 2x4s on the roof of the structure, about 6ft tall. Assuming the roof of the structure is 8ft high this gives an antenna height of 14 feet, which should be plenty if the ground you're surveying is relatively flat and you locate the building relatively close to the center of the sensor field. Connect the remote nodes securely to the AP. Create a share on the server and write all data in real time from the nodes to the server. You'll power the rack with a small gas generator sitting outside the shed, with a fuel tank large enough to run for 48 hours. This allows you to keep collecting data in the event weather etc causes you to miss a cycle.

      If the data needs to be analyzed on a 24 hour cycle, you will build two identical server cabinets. Instead of collecting drives from all the nodes, you simply drive your van out, power down, disconnect a few cables, roll the cabinet out, roll the sister cabinet in, connect cables, power on, check for proper function, refill the gas tank in the generator, roll the retrieved cabinet onto the van, and go. Return to base, wheel the cabinet into the office, jack into the network, connect to the share, analyze.

      THIS is how field data collection and analysis is done properly for most scenarios.

  2. Wow by Anonymous Coward · · Score: 5, Insightful

    I know I’m not going to be the first person to ask this, but if I understand it the plan here was:

    1 - buy lots of hardware and install
    2 - think about what kind of software it will run and how it will be used

    I think you got your methodology swapped around man!

    Why is documentation for *nix always so bad?

    You are looking for information that your average user won’t care about. Things like boot time don’t get documented because your average user isn’t going to have some arbitrary requirement to have their _file server_ boot in 30 seconds. That’s a very weird use case. Normally you reboot a file server infrequently (unless you want to be swapping disks out constantly..). I’m assuming this requirement is because you plan on doing a full shutdown to insert your drives... in which case you really should be looking into hotswap

    Also mandatory: you sound horribly underqualified for the job you are doing. Fess up before you waste even more (I assume grant) money and bring in someone that knows what the hell they are doing.

    1. Re:Wow by LodCrappo · · Score: 4, Insightful

      I know I’m not going to be the first person to ask this, but if I understand it the plan here was:

      1 - buy lots of hardware and install
      2 - think about what kind of software it will run and how it will be used

      I think you got your methodology swapped around man!

      Why is documentation for *nix always so bad?

      You are looking for information that your average user won’t care about. Things like boot time don’t get documented because your average user isn’t going to have some arbitrary requirement to have their _file server_ boot in 30 seconds. That’s a very weird use case. Normally you reboot a file server infrequently (unless you want to be swapping disks out constantly..). I’m assuming this requirement is because you plan on doing a full shutdown to insert your drives... in which case you really should be looking into hotswap

      Also mandatory: you sound horribly underqualified for the job you are doing. Fess up before you waste even more (I assume grant) money and bring in someone that knows what the hell they are doing.

      Wow.. I completely agree with an AC.

      The OP here is in way over his head and the entire project seems to have been planned by idiots.

      This will end badly.

      --
      -Lod
    2. Re:Wow by Anonymous Coward · · Score: 0, Insightful

      Agreed.

      Also: the submitter asks about a "distro". A distro is a pre-packaged solution for a broad group of users. He has to build and test his own solution.

      If you know what you are doing, it does not matter which distro you are using.

      To the boss of the submitter: fire him and hire somebody who has a clue.

    3. Re:Wow by Anonymous Coward · · Score: 2, Interesting

      He still hasn't told us what filesystem is on these drives they're pulling out of the field. That's the most important detail...........

    4. Re:Wow by mschaffer · · Score: 4, Informative

      [...]

      Wow.. I completely agree with an AC.

      The OP here is in way over his head and the entire project seems to have been planned by idiots.

      This will end badly.

      Like that's the first time. However, we don't know all of the circumstances and I wouldn't be surprised that the OP had this dropped into his/her lap.

    5. Re:Wow by arth1 · · Score: 4, Informative

      Yeah. Before we can answer this person's questions, we need to know why he has:
      1: Decided to cold-plug drives and reboot
      2: Decided to use Linux
      3 ... to serve to Windows

      Better yet, tell us what you need to do - not how you think you should do it. Someone obviously needs to read data that's collected, but all the steps in between should be based on how it can be collected and how it can be accessed by the end users. Tell us those parameters first, and don't throw around words like Linux, samba, booting, which may or may not be a solution. Don't jump the gun.

      As for documentation, no other OSes are as well-documented as Linux/Unix/BSD.
      Not only are there huge amounts of man pages, but there are so many web sites and books that it's easy to find answers.

      Unless, of course, you have questions like how fast a distro will boot, and don't have enough understanding to see that that that depends on your choice of hardware, firmware and software.
      I have a nice Red Hat Enterprise Linux system here. It takes around 15 minutes to boot. And I have another Red Hat Enterprise Linux system here. It boots in less than a minute. The first one is -- by far -- the better system, but enumerating a plaided RAID of 18 drives takes time. That's also irrelevant, because it has an expected shutdown/startup frequency of once per two years.

    6. Re:Wow by Anonymous Coward · · Score: 4, Informative

      Op here:

      The gear was sourced from a similar prior project that's no longer needed, and we don't have the budget/authorization to buy more stuff. Considering that the requirements are pretty basic, we weren't expecting to have a serious issue picking the right distro.

      >You are looking for information that your average user won’t care about.

      Granted, but I thought one of the strengths of *nix was that it's not confined to computer illiterates. Some geeks somewhere should know which distros can be stripped down to bare essentials with a minimum of fuss.

      As for the 30 seconds thing, there's a lot side info I left out of the summary. This project is quirky for a number of reasons, and one of them being that the server itself spends a lot of time off, and needs to be booted (and halted) on demand. (Don't ask, it's a looooooong story).

    7. Re:Wow by Coz · · Score: 1

      Consider setting up several servers and GlusterFS, auto-replicating the data when it's mounted and presenting a infield shared file system. You can run CentOS or RHEL6 for the OS, and the FS will take care of data persistence, replication, and presenting a CIFS or NFS view.

      --
      I love vegetarians - some of my favorite foods are vegetarians.
    8. Re:Wow by Anonymous Coward · · Score: 0

      Op here:

      > If you know what you are doing, it does not matter which distro you are using.

      No, it shouldn't, but there's a lot to be said about the value of your time. I know I can spend a week futzing with basically anything and get it to work, but I'm hoping to save time by having someone point out a couple good candidates to start with.

      > A distro is a pre-packaged solution for a broad group of users

      To an extent, sure, but "distro" also covers things like Arch, which is pretty damn close to "build and test our own solution"

    9. Re:Wow by Nutria · · Score: 1

      Some geeks somewhere should know which distros can be stripped down to bare essentials with a minimum of fuss.

      Debian (The Universal OS)
      RHEL/CentOS/Scientific
      Gentoo
      Slackware

      --
      "I don't know, therefore Aliens" Wafflebox1
    10. Re:Wow by Crudely_Indecent · · Score: 2

      I would further enhance the question by asking: What the hell are you collecting that each sensor stores 500GB in 24 hours - photos? Seriously, these aren't sensors - they're drive fillers.

      Seriously, if "sensor units scattered across a couple square miles" means 10 sensors - that's 5 Terabytes to initialize and mount in 30 seconds. I suspect that the number is greater than 10 sensors because the rest of the requirements are so ridiculous.

      And why the sneakernet? If they're in only a couple of square miles - why not set up a mesh network and deliver real-time data without the need for daily collection? 30 seconds to boot probably wouldn't be a requirement if the system is only booted once.

      All of the questions about why this person is even involved are probably moot. He'll be outed as an idiot in short order.

      --


      "Lame" - Galaxar
    11. Re:Wow by LVSlushdat · · Score: 1

      Jesus Christ, WHAT AN ASSHOLE!! Telling the boss to fire him, with jobs as scarce as they are..... For all YOU know, AC-Asshole, the boss might be the one TELLING him the hardware he has to use.. LOVE AC's hiding behind anonymity so they can spew their hatred.. I know, I know, replying to AC's... just encourges em... So shoot me, I HATE people like this AC...

      --
      THANK YOU, Edward Snowden!! Americans owe you a debt of gratitude (whether they know it or not..)
    12. Re:Wow by Anonymous Coward · · Score: 3, Interesting

      Op here:

      1) The cold plug is not the issue, rather, the server itself needs to be booted and halted on demand (don't ask, long story).

      2) Because it's better? Do I really need to justify not using windows for a server on Slashdot?

      3) The shares need to be easily accessible to mac/win workstations. AFAIK samba is the most cross-platform here, but if people have a better idea I'm all ears.

      > Better yet, tell us what you need to do

      - Take a server that is off, and boot it remotely (via ethernet magic packet)
      - Have that server mount its drives in a union fashion, merging the nearly-identical directory structure across all the drives.
      - Share out the unioned virtual tree in such a way that it it's easily accessible to mac/win clients
      - Do all this in under 30 seconds

      I don't know why people keep focusing on the "under 30 seconds" part, it's not that hard to get linux to do this.....

      > huge amounts of man pages

      quantity != quality

    13. Re:Wow by msobkow · · Score: 3, Informative

      The "under 30 seconds part" is not as easy as you think.

      You're mounting new drives -- that means Linux will probably want to fsck them, which with such volume, is going to take way more than 30 seconds.

      --
      I do not fail; I succeed at finding out what does not work.
    14. Re:Wow by quist · · Score: 1

      ...know which distros can be stripped down ... with a minimum of fuss.[?]

      Debian (The Universal OS)
      RHEL/CentOS/Scientific [...]

      And don't forget to compile a bespoke, static kernel.

    15. Re:Wow by Anonymous Coward · · Score: 0

      the nineties called and they want their inefficient filesystems back.

    16. Re:Wow by Anonymous Coward · · Score: 1

      Better question: Is this that DEA license plate camera project out in the desert somewhere? That would definitely fit all the mentioned criteria and be JUST bandwidth intensive enough to not make the mesh network idea feasible.

    17. Re:Wow by Anonymous Coward · · Score: 0

      Not to mention, the BIOS may want to do its checks on the system and the disk controller may too, depending on the setup. If you know a regular schedule, I'd set timers so the real time clock wakes the system before you need it. But as it stands, there is not enough detail as to what you are doing and what hardware you have, which makes it next to impossible to give either generic descriptions ("any system will work if you cut back on services" or "none will work") or just complain about how stupid the project seems.

    18. Re:Wow by plover · · Score: 3, Insightful

      While I'm curious as to the application, it's his data rates that ultimately count, not our opinions of if he's doing it right.

      500GB may sound like a lot to us, but the LHC spews something like that with every second of operation. They have a large cluster of machines whose job it is to pre-filter that data and only record the "interesting" collisions. Perhaps the OP would consider pre-filtering as much as possible before dumping it into this server as well. If this is for a limited 12 week research project, maybe they already have all the storage they need. Or maybe they are doing the filtering on the server before committing the data to long term storage. They just dump the 500GB of raw data into a landing zone on the server, filter it, and keep only the relevant few GB.

      Regarding mesh networking, they'd have to build a large custom network of expensive radios to carry that volume of data. Given the distances mentioned, it's not like they could build it out of 802.11 radios. Terrain might also be an issue, with mountains and valleys to contend with, and sensors placed near to access roads. That kind of expense would not make sense for a temporary installation.

      I don't think he's an idiot. I just think he couldn't give us enough details about what he's working on.

      --
      John
    19. Re:Wow by Anonymous Coward · · Score: 0

      > I don't know why people keep focusing on the "under 30 seconds" part, it's not that hard to get linux to do this.....

      If it's not hard, why are you asking how to do it?

    20. Re:Wow by PNutts · · Score: 1

      Yeah, it reminds me of my experiences trying to get help in forums. I didn't want to just put in a CD and watch it install and boot to a shell. I had the hardware I had and for fun wanted it to do interesting things. Instead of conversations and tips / tricks I received a lot of "you're stupid if you want to do that and you're even stupider if you can't figure it out. I (make it seem like I) know and I won't tell you."

    21. Re:Wow by Anonymous Coward · · Score: 0

      Why is documentation for *nix always so bad?

      You are looking for information that your average user won’t care about...

      Relax. This is just the standard troll-question ending that's necessary to get anything on the /. frontpage.

    22. Re:Wow by arth1 · · Score: 3, Interesting

      1) The cold plug is not the issue, rather, the server itself needs to be booted and halted on demand (don't ask, long story).

      Yes, I will ask why. Like why booted, and not hibernated, for example, if part of the reason is that it has to be powered off.
      If the server is single-purpose file serving of huge files once, it does not benefit from huge amounts of RAM, and can hibernate/wake in a short amount of time, depending on which peripherals have to be restarted.

      2) Because it's better? Do I really need to justify not using windows for a server on Slashdot?

      Yes? While Microsoft usually sucks, it can still be the least sucky choice for specific tasks. And there are more alternatives than Linux out there too.

      3) The shares need to be easily accessible to mac/win workstations. AFAIK samba is the most cross-platform here, but if people have a better idea I'm all ears.

      What's the format on the drives? That can be a limiting factor. And what's the specifics for "sharing"? Must files be locked (or lockable) during access, are there access restrictions on who can access what?
      For what it's worth, Windows Vista/7/2008R2 all come with Interix (as "Services for Unix") NFS support. So that's also an alternative.

      - Take a server that is off, and boot it remotely (via ethernet magic packet)

      That you want to "wake" it does not imply that the server has to be shut off. It can be in low power mode, for example - Apple's "bonjour" (which is also available for Linux) has a way to "wake" services from low-power states.

      - Have that server mount its drives in a union fashion, merging the nearly-identical directory structure across all the drives.

      Why? Sharing a single directory under which all the drives are mounted would also give access to all the drives under a single mount point - no need for union unless you really need to merge directories and for some reason cannot do the equivalent with symlinks ("junctions" in MS jargon).
      Unions are much harder, as you will need to decide what to do when inevitably the same file exists on two drives (even inconspicuous files like "desktop.ini" created by people browsing the file systems).
      Even copying the files to a common (and preferably RAIDed) area is generally safer - that way, you also don't kill the whole share if one drive is bad, and can reject a drive that comes in faulty.
      But you seem to have made the choices beforehand, so I'm not sure why I answer.

      - Do all this in under 30 seconds

      You really should have designed the system with the 30 seconds as a deadline then.

      If I were to do this, I would first try to get rid of the sneakernet requirement. 4G modems sending the data, for example. But if sneakernetting drives is impossible to get around, I'd choose a continuously running system with hotplug bays and automount rules.

      Unless the data has to be there 30 seconds from when the drive arrives (this is not clear - from the above it appears that only the client access to the system has that limit), I'd also copy the data to a RAID before letting users access it.

      Sure, Linux would do, but there's no particular flavour I'd recommend. ScientificLinux is a good one, but *shrug*.
      If you need support, Red Hat, but then you also should buy a system for which RHEL is certified.

    23. Re:Wow by Anonymous Coward · · Score: 0

      From the way you speak it's quite clear you've never had an actual job where actual company practices, rules and standards actually restrict how you do something. I used to work at a place where all computer equipment must be turned off for the weekends, because management wanted to conserve power. An always on server was unthinkable, not because it wasn't a good idea, but because it would have ruffled the feathers of guys with bigger paychecks.

      As for the sneakernet method, well... did you miss the part about 500GB of data per sensor per 24h? Think the 4G providers would be okay with that amount of data, even if they had coverage out in UNDEVELOPED areas?

      Actually, forget it, it's not that you haven't had an actual job, it's that you don't live in the real world.

    24. Re:Wow by darkonc · · Score: 1

      I would have thought that the Gentoo people would have known the most about stripping down a system to the bare essentials.

      --
      Sometimes boldness is in fashion. Sometimes only the brave will be bold.
    25. Re:Wow by Anonymous Coward · · Score: 0

      The project sounds a bit like LOFAR , with 20 000 antennas collecting data.
      However, I can't imagine the LOFAR project buying their hardware and then putting in the software as an afterthought.. especially since apparently it's running on a Blue Gene/P supercomputer.. Anybody care how fast a Blue Gene/P boots? :-)

    26. Re:Wow by gweihir · · Score: 1

      So a person messing things up, not learning (because the curve is too steep) and wasting money, time en effort of others is going to help how? Unless you propose that now it is every person form himself and doing something about the crisis by actually being efficient and effective is completely irrational.

      You know, people only get fired for incompetence because they are not even competent to see their own limits or if they lack the guts to do something about it. Like get a job they can actually do well.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    27. Re:Wow by gweihir · · Score: 1

      Good call on the cold-plug. Hot-plugging SATA on Linux works very well, provided you have the right controller. I do it frequently for testing purposes.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    28. Re:Wow by F.Ultra · · Score: 1

      Basically all distros can be stripped down easily. However you will have most sucess if you install the server version of Ubuntu, CentOs, Red Hat or Debian since the baseinstall there is already very restricted and you would not need to strip it further, just make sure that you disable the desktop in CentOs/RedHat/Debian (no need in Ubuntu since the server is console only as default).

      Instead of a unionfs you should raid the drives but since that seams to be no option(?) then you probably need something like unionfs and documentation for how to set something like that up should be easy to find. I do

      I don't really understand why you find it so difficult to find documentation for what you want, have you even googled once?

    29. Re:Wow by drolli · · Score: 1

      6 Megasamples @ 8 bit take more than 500GB per day.

      So that would be a pretty low-end AD converter.

    30. Re:Wow by Anonymous Coward · · Score: 2, Insightful

      Seismic data.
      Radio spectrum noise level.
      Accoustic data.
      High frequency geomagnetic readings.
      Any of various types of environmental sensors.

      Any of the above, or combination thereof, would be pretty common in research projects, and could easily generate 500gb+ per day. And the only thing you thought of was photos. You're not a geek, you're some Facebook Generation fuckwit who knows jack shit about science. Go back to commenting on YouTube videos.

    31. Re:Wow by rwa2 · · Score: 1

      Sounds reasonable.

      Any recent Linux distro should do... just stick with whatever you have expertise with. Scientific Linux would probably be the most suitable RHEL / CentOS clone... but it also comes with OpenAFS (which also has Windows clients) which might allow you another option to improve filesharing performance over Samba (I haven't played with it myself, though). Linux Mint is my current favorite Debian / Ubuntu distro.

      Either would likely mount SATA disks that were hotplugged automatically under /media/
      You could just configure Samba to share out /media/ (probably need some option to allow it to share files across other filesystems) , and the Windows users will just see disks appear and disappear under that shared tree as disks are added and removed... no fancy unionfs required. Then you just have to worry about giving all of the data disks filesystems with unique, descriptive filesystem labels.

      Don't know much about your dataset, but I'd expect the filesystem to be the biggest bottleneck based on past experience. It adds lots of latency and takes a long time to transfer lots of uncompressed files... lots of scientific data sets compress very well (~30x) and often transfer over the network much faster as a few big .zip or .tgz'd files than as a mess of a directory tree (another ~10x). So if you take the extra time to tinker and pipe your data collection directly to some form of compressed archive, you can reap some awesome performance improvements.

      Sounds fun and good luck!

    32. Re:Wow by Anonymous Coward · · Score: 0

      BIOS? 1990's called and they want their outdated platform firmware back.

    33. Re:Wow by fuzzyfuzzyfungus · · Score: 1

      Anybody care how fast a Blue Gene/P boots? :-)

      Just hire an IBM consultant to boot it for you. That will give you a new perspective on the costs of boot time...

    34. Re:Wow by u38cg · · Score: 1

      Yeah, welcome to academic computing.

      --
      [FUCK BETA]
    35. Re:Wow by sg_oneill · · Score: 1

      No its just a generic science data collection application by the sounds of it/.

      The data rate he's describing is absolutely nothing unusual in the sciences.

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
    36. Re:Wow by Fallen+Kell · · Score: 3, Informative
      Even though I believe I am being trolled, I will still feed it some.

      1) The cold plug is not the issue, rather, the server itself needs to be booted and halted on demand (don't ask, long story).

      You will never find enterprise grade hardware which will do this. You will be even harder pressed to do this on mechanical drives (for the OS) and even harder still with random new drives being attached which may need to have integrity scans performed. This requirement alone is asinine and against every rule for data center and system administration handbook for something that is serving data to other machines. If you need something that you need to halt and shutdown so you can load the drives, well, you do that on something else other than the box which is servicing the data requests to other computers, and you copy the data from that one system to the real server.

      2) Because it's better? Do I really need to justify not using windows for a server on Slashdot?

      No you don't need to justify it, but you do need to explain it some. For the most part it sounds like most people where you work do not have much experience with *nix systems, because if you did, you would never have had requirement (1) in the first place (as you would know the whole point of *nix is to be able to separate everything so that you don't have to bring down the system just to update/replace/remove one particular service/application/hardware, everything is compartmentalized and isolated, which means the only time you should ever need to bring down the system is due to catastrophic hardware failure or you needed to update the actual kernel, otherwise everything else should be build such a way that is is hot-swappable, redundant, and/or interchangeable on the fly).

      3) The shares need to be easily accessible to mac/win workstations. AFAIK samba is the most cross-platform here, but if people have a better idea I'm all ears.

      Well, SAMBA is the only thing out there that will share to Win/Mac clients from *nix, so that is the right solution.

      - Take a server that is off, and boot it remotely (via ethernet magic packet) - Have that server mount its drives in a union fashion, merging the nearly-identical directory structure across all the drives. - Share out the unioned virtual tree in such a way that it it's easily accessible to mac/win clients - Do all this in under 30 seconds

      I don't know why people keep focusing on the "under 30 seconds" part, it's not that hard to get linux to do this.....

      They are focusing on the "under 30 seconds" part because they know that it is an absurd requirement for dealing with multiple hard drives which may or may not have a working filesystem as they have not only traveled/been shipped, but have also been out in the actual field. The probability of data corruption is so astronomically higher that they know that the "under 30 seconds" is idiotic at best.

      For instance, I can't even get to the BIOS in 30 seconds on anything that I have at my work. Our data storage servers take about 15-20 minutes to boot. Our compute servers take about 5-8 minutes. They spend more that 30 seconds just performing simple memory tests at POST, let alone hard drive identification and filesystem integrity checks or actually booting. This is why people are hung up on the "under 30 seconds".

      If you had a specialty build system, in which you disabled all memory checks (REALLY BAD IDEA on a server though since if your memory is bad you can corrupt your storage because writes to your storage are typically from memory), used SSDs for your OS drives, had no hardware raid controllers on the system, used SAS controllers which do not have firmware integrity checks, you might, just might be able to boot the system in 30 seconds. But I sure as hell would not trust it for any kind of important data because you had to disable all the hardware tests which means you have no idea if there are hardware problems which are corrupting your data.

      --
      We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
    37. Re:Wow by Anonymous Coward · · Score: 0

      The number of times some manager has walked in and said, "Hey, you know about computers... we have this thing that we already bought... make it work"
      In IT you get broad shoulders and a big lap, free with the territory.

      As to *nix docs being so bad - the old ones were written by nerds for nerds and are extremely detailed; the new stuff is cobbled together by volunteers in their 20's as an afterthought. And it shows.

    38. Re:Wow by arth1 · · Score: 2

      Well, SAMBA is the only thing out there that will share to Win/Mac clients from *nix, so that is the right solution.

      There's OpenAFS.
      And let's not forget NFS, since non-home versions of Windows Vista/7/2008R2 come with Interix and an NFS client. (Control Panel -> Programs and Features -> Windows Features -> Services for NFS -> Client for NFS)
      NFS is the least secure, but it uses the least amount of resources on the server (this seems to be old inherited hardware), and if the rsize/wsize is bumped up on the server side, it's faster too.

      (I'm sharing my music collection over NFS to Windows 7, so I know it works. I used to use Samba, but NFS was much faster -- indexing went down from 12 minutes to 3 minutes.)

    39. Re:Wow by Alex+Belits · · Score: 4, Funny

      I worked with networked computers in professional capacity longer than all of you combined, and I completely agree with the person you are replying to.

      You are absolutely definitely unqualified to make any design decisions about the project you have described. The design is stupid, requirements are idiotic, and if iot was implemented in such manner it would not work for many reasons that you don't seem to be capable of understanding.

      On top of that massive ignorance, you are stupid.

      --
      Contrary to the popular belief, there indeed is no God.
    40. Re:Wow by Anonymous Coward · · Score: 0

      Like that's the first time. However, we don't know all of the circumstances and I wouldn't be surprised that the OP had this dropped into his/her lap.

      Irrelevant to the situation. A competent information technology professional should be able to conduct the necessary research to obtain the information which will enable the deployment of a suitable system. As I have been building out the information systems infrastructure for my research organization the ability to analyse the requirements and systematically implement each component of the information systems infrastructure has proven itself invaluable. If the OP would like my assistance, for a fee, I would be willing to provide my knowledge.

      "[T]he entire project seems to have been planned by idiots." A more succinct description of the situation escapes me. ;)

    41. Re:Wow by sumdumass · · Score: 1

      I work with smaller companies and I'm always amazed at how a salesman can make a pointy headed boss woo like a little schoolgirl over their product and talk them into buying it without ever checking if it will works or anything.

      But in another post up the thread, the op said they inherited the hardware from a similar but now defunct project. So it isn't quite as bad but still PHB saying we have this, make it work.

    42. Re:Wow by sumdumass · · Score: 1

      I've heard stories like that before. I've never seen it happen though. In all the forums I asked questions in, I either got help, a read section X of the manual or follow this link to a walk-through, or a simple I don't know. A whole lot of I don't knows I might add. I did have someone tell me once to use a different piece of hardware that was known to work well instead of something no one had ever heard of.

      I did get a best buy tech/sales drone/geeksquad tell me Linux was too hard and I shouldn't be messing with it when I was looking for a specific U,S Robotics modem that was on their website and all he wanted to show me was some wintel host process bullshit.

      I'm not saying it didn't happen to you though. Obviously, we weren't at the same forums or IRC channels as I would have seen it happen to you.

    43. Re:Wow by JackieBrown · · Score: 1

      Same here. My questions were usually either answered or ignored. Very rarely did I receive rude responses. And when I look at my old posts, I probably deserved a lot more rudeness than I ever got.

    44. Re:Wow by CAIMLAS · · Score: 2

      Add to the fact that he doesn't really even seem to understand the problem himself, or know the tools he's got to work with.

      I'm sorry, but: "UNIX has bad documentation"? Wikipedia itself is chock full of useful documentation in this regard. You can find functional "this is how it works" information on pretty much every single component and technology with ease. (You do, however, need to know what you're looking for.)

      Try to do the same for Windows. The first 12 pages of search results will likely be marketing bullshit for eg. DFS - and what you do find won't really tell you how it works, or how flexible it is.

      As with all computing technology, the only real way to figure things out is to try them, experiment, and learn how they work. This is why people hire professionals and not interns who know how to read.

      As for the topic at hand... this is what I understand you to be asking:

      * You want a linux/unix solution that boots in 30 seconds
      * you want a unified file hierarchy (filesystem) from a dozen drives of unknown format which must remain unmolested
      * you want this unified hierarchy to be exported to Windows, somehow

      As far as I know, there is no technology which meets any two of your requirements, even if you remove 'boot in under 30 seconds' (a very odd requirement indeed). This is a massive undertaking, and it's clear nobody thought about this from the beginning.

      Looking at the information you've provided, I can see several different options. Due to the area involved, a better option might be to stream said data to a central system using wireless technology (there are many available which can cover this range). This data could then be stored in a database of some sort (NoSQL or Hadoop, maybe?) or a modern scalable filesystem, like ZFS, which you could probably fairly easily implement as the backend for whatever you're doing, regardless.

      ZFS will run on FreeNAS, Nas4Free, FreeBSD, Illumos, NexentaStor, and depending on who you ask (IMO, the performance and stability is on par with and superior to the FreeBSD stuff due to hardware drivers working properly), RedHat, Ubuntu, Debian, etc. using the ZFSonLinux port.

      Regardless of what method is undertaken, "you can't get there from here" applies. You've got to back up and examine your assumptions/presumptions about what is possible and what you've got to work with. I would wager that you're assumption of what you have to do, and what you're limited in doing, on the Windows side may be key here.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    45. Re:Wow by ogl_codemonkey · · Score: 1

      I once solved a similar problem with FLAC; seems to work great on most environmental data. YMMV.

    46. Re:Wow by bingoUV · · Score: 1

      Some geeks somewhere should know which distros can be stripped down to bare essentials with a minimum of fuss.

      2 answers :
      1. ALL distros can be stripped down to bare essentials with a minimum of fuss, and amount of fuss is completely dependent upon understanding of the user doing the stripping down.

      2. Sure, some geeks somewhere know it. But you are not that geek. Minimum of fuss to the geek is a lot of fuss, or even impossible for you. So geek knowing the answer to the question is useless for you since his answer is NOT what you need.

      Not to mention that "minimum of fuss" is not well-defined nor is "bare essentials". So wrong question.

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
    47. Re:Wow by mpe · · Score: 1

      You are looking for information that your average user wonâ(TM)t care about. Things like boot time donâ(TM)t get documented because your average user isnâ(TM)t going to have some arbitrary requirement to have their _file server_ boot in 30 seconds. Thatâ(TM)s a very weird use case. Normally you reboot a file server infrequently (unless you want to be swapping disks out constantly..). Iâ(TM)m assuming this requirement is because you plan on doing a full shutdown to insert your drives... in which case you really should be looking into hotswap

      A SATA (or PATA) to USB bridge will make any drive effectivly hot swappable. Also the server can quite easily boot from an SSD.

    48. Re:Wow by Anonymous Coward · · Score: 0

      Sounds like you need a USB hard drive. It boots in 30 seconds and can be hot swapped. I dont think you need linux installed either...

    49. Re:Wow by mpe · · Score: 1

      Better yet, tell us what you need to do - not how you think you should do it.

      This can be a common issue with someone who thinks they know more than they actually do about IT.

      Someone obviously needs to read data that's collected, but all the steps in between should be based on how it can be collected and how it can be accessed by the end users. Tell us those parameters first

      Something about the data and "sensors" might also be a good starting point.

    50. Re:Wow by Alex+Belits · · Score: 1

      First of all, hard drives don't boot...

      --
      Contrary to the popular belief, there indeed is no God.
    51. Re:Wow by pnutjam · · Score: 1

      I would use a script to unmount drives when they are done collecting information, then you can hotswap them (assuming your hardware supports that. Just have the script unmount one drive and mount the next drive at the same mountpoint.

    52. Re:Wow by Anonymous Coward · · Score: 0

      OP Here.

      Thanks for the support. On another project, I'd like to know where I can get a metallic blue 2012 mercedes E class, mint condition for $100 or less. Also, I need it delivered in 30 seconds or less. No hotwheels or other miniature version, I need the real, drivable, street-legal one. I don't understand why car dealers are so rude when I give them those simple requirements.

      Or maybe not OP, but I am giving equally ridiculous requirements.

      30 seconds to boot and share out 500GB of uninitialized data via a new drive? And multiple drives at that? Most Video cards can't pump that, and he wants it from a mechanical drive? Sure, it can be done if you fsck and shutdown the field system and pre-screen the HW on the server, but you're disabling every parity check on a 500GB*X/day system and you don't think you'll have problems? You, OP, are nuts.

    53. Re:Wow by Anonymous Coward · · Score: 0

      I worked with networked computers in professional capacity longer than all of you combined..

      Judging by the photo on your homepage you're either lying or use vast quantities of anti-aging cream, hair dye and plastic surgery..... The guy may be well out of his depth, but surely constructive advice would be preferable, especially if you're as experienced as you claim.

    54. Re:Wow by Alex+Belits · · Score: 1

      Judging by the photo on your homepage you're either lying or use vast quantities of anti-aging cream, hair dye and plastic surgery.....

      Or maybe I just have no life.

      The guy may be well out of his depth, but surely constructive advice would be preferable, especially if you're as experienced as you claim.

      Any "constructive" advice would enable him to make more and worse mistakes. There is no replacement for learning things in depth.

      --
      Contrary to the popular belief, there indeed is no God.
    55. Re:Wow by fuzzywig · · Score: 1
      I'd guess the OP didn't make any of those decisions, he was probably sat in his office one day quite happily until his boss walked in and dumped the whole project on the OP's desk, only stopping to demand that it be up and running by next week.

      Oh, and there's a budget of $0.

      If you get to make the sensible choices in things you work on then I envy you, a lot of us just get existing problems dumped in our laps because "you're the computer person right?"

  3. OpenAFS+Samba by Zombie+Ryushu · · Score: 1

    Use OpenAFS with Samba's modules. Distribution doesn't matter.

    1. Re:OpenAFS+Samba by wytcld · · Score: 1

      Looking at the OpenAFS docs, they're copyright 2000. Has the project gone stale since then?

      --
      "with their freedom lost all virtue lose" - Milton
    2. Re:OpenAFS+Samba by Monkius · · Score: 1

      OpenAFS is not dead. IIRC, any Samba AFS integration probably is. This doesn't sound like a job for AFS, however.

      --
      Matt
  4. Mechanical Hard Drive by Anonymous Coward · · Score: 3, Insightful

    Why does it have to be a mechanical hard drive? Why not use an SSD for the boot drive?

    1. Re:Mechanical Hard Drive by davester666 · · Score: 4, Funny

      They already bought a $20 5400rpm 80Gb drive and don't want it to be wasted.

      --
      Sleep your way to a whiter smile...date a dentist!
    2. Re:Mechanical Hard Drive by mspohr · · Score: 2

      It sounds like they inherited a bunch of hardware and don't have a budget for more stuff.
      So... make do with what you have.

      --
      I don't read your sig. Why are you reading mine?
    3. Re:Mechanical Hard Drive by Type44Q · · Score: 2

      Why does it have to be a mechanical hard drive?

      Their "couple square miles of undeveloped land" is actually a minefield, and to avoid accidental detonation (you know, magnetic triggers and all that), the situation calls for a purely mechanical hard drive - perhaps one running on water power.

      Just a wild guess...

  5. Is this a joke? by Anonymous Coward · · Score: 0

    Is this a joke? A troll?

    Any Linux distribution will boot in less than 30 seconds if you turn off all the services you don't need, which is probably most of them in your case. Any modern distribution will have packages for aufs ready to install. Any modern distribution will tell you, via D-Bus, when a removable disk is plugged in so you can run whatever program you want to handle it e.g. a script that mounts it at the right place in your tree.

    1. Re:Is this a joke? by marcosdumay · · Score: 2

      Any Linux distribution will boot in less than 30 seconds if you turn off all the services you don't need... will have packages for aufs ready to install... will tell you, via D-Bus, when a removable disk is plugged...

      You know, I was on the "it doesn't matter" camp untill I readed your post. Now I just changed my mind.

      Yes, any distro will do it. You'll have the same (lack of) trouble configuring the service on any distro. So, choose a distro that is easy to get into bare bones and to upgrade, because those are the two main differentiators here.

      I sugest Slackware. Probably somebody else knows about somethig simpler, but not so simple that it will end up giving you more work.

    2. Re:Is this a joke? by fearlezz · · Score: 1

      Any Linux distribution will boot in less than 30 seconds if [..]

      Linux does. Too bad it takes the bios and raid array of a server up to minutes to do their checks...

      --
      .sig: No such file or directory
    3. Re:Is this a joke? by techno-vampire · · Score: 3, Informative

      If you want different removeable disks to be mounted in different places, it's even easier. Just list each disk (identified by UUID) in /etc/fstab, with the proper mountpoint and include auto in the options. That way, when you plug it in, the system knows exactly where it goes.

      --
      Good, inexpensive web hosting
    4. Re:Is this a joke? by Pinky's+Brain · · Score: 2

      Actually AUFS requires kernel patches, it's never been mainlined because the kernel maintainers like their own union mounts better ... even though they are far less useful (not write-through like AUFS, which is really nice for something like a file server) and forever undelivered. That said, I think Ubuntu and SUSE still come with AUFS patched in ... for Debian you have to compile your own patched kernel or use something like the Liquorix kernel.

  6. I would automate the copying by guruevi · · Score: 4, Informative

    Really, singular hard drives are notoriously bad at keeping data around for long. I would make sure you have a copy of everything. So make a file server with RAIDZ2 or RAID6 and script the copying of these hard drives onto a system that has redundancy and is backed up as well.

    How many times I have seen scientist come out with their 500GB portable hard drives and they are unreadable... way too much. If you fill 500GB in 24 hours, there is no way a portable hard drive will survive for longer than about a year. Most of our drives (500GB 2.5" portable drives) last a few months, once they have processed about 6TB of data full-time they are pretty much guaranteed to fail.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
    1. Re:I would automate the copying by Anonymous Coward · · Score: 1

      OP here:

      We don't need persistence, this data is pretty ephemeral and there's little point in backing it up. If we lose the data from one sensor one day, it's no big thing.

      The analysis generated from this WILL be backed up though, but that's a different system that's already covered.

      >If you fill 500GB in 24 hours, there is no way a portable hard drive will survive for longer than about a year

      These are fullsize desktop drives for exactly that reason.

    2. Re:I would automate the copying by Anonymous Coward · · Score: 0

      Most of our drives (500GB 2.5" portable drives) last a few months, once they have processed about 6TB of data full-time they are pretty much guaranteed to fail.

      Jesus, and I thought SSDs were bad at 30k writes per site. By my reckoning, you're getting on average 12 writes per site. Are you transporting your drives by booting them down the corridor?

    3. Re:I would automate the copying by Amouth · · Score: 1

      >If you fill 500GB in 24 hours, there is no way a portable hard drive will survive for longer than about a year

      These are fullsize desktop drives for exactly that reason.

      You realize that being "full size desktop drives" makes zero difference for write duty cycle on mechanical drives?

      As long as your not on the bleeding edge of platter density then the manufacturers use the same process for all platters both large and small. For lower capacity larger drives they just reduce the number of platters in the drive.

      --
      '...if only "Jumping to a Conclusion" was an event in the Olympics.'
    4. Re:I would automate the copying by Dr_Barnowl · · Score: 1

      It could well be physical shock ; I've changed from spinning disks in a 2.5" caddy to an SSD. Some of the problems I had were to do with cheap-assed caddies with lousy power electronics that would fail. But I had three disks die in about a year ; I changed to SSD and it's been going strong ever since.

      These disks were only transported twice a day ; to work, and then home. But they inevitably got dropped sooner sorter or later.

    5. Re:I would automate the copying by gweihir · · Score: 1

      You are doing it with desktop-drives? You know that they can take far less shock/heat than 2.5" notebook drives, right?

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    6. Re:I would automate the copying by bzipitidoo · · Score: 1

      High elevations can do it too. Mechanical hard drives need some air pressure. Take a typical hard drive up to 17000 ft, unpressurized, and it will fail very quickly. Most hard drive documentation mentions somewhere that they shouldn't be used above 10000 ft, and aren't warranted for that.

      --
      Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
    7. Re:I would automate the copying by Macrat · · Score: 1

      Most of our drives (500GB 2.5" portable drives) last a few months, once they have processed about 6TB of data full-time they are pretty much guaranteed to fail.

      This is interesting. How have SSDs held up under that use?

    8. Re:I would automate the copying by guruevi · · Score: 1

      The cheap ones fail just as fast, I ship the same OCZ Onyx drive back and forth to their RMA site it seems like, I've killed a couple of Vertexes also. The more expensive ones (SLC) are much better and the Intel 32GB SLC's have held up but they're so expensive it's really not worth it.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    9. Re:I would automate the copying by guruevi · · Score: 1

      Well, the data is read/write quite intensively and HDD's don't write modified data to new areas as SSD's may do. They are 500GB capacity but the dataset is maybe 20-30GB all together for a week long analysis.

      Yes, it's probably dropped and mishandled a lot by the students who do the work as well and the enclosures are pretty crappy. The manufacturer states that the drive will statistically generate an error for every 12TB and they're probably not built for intensive use.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    10. Re:I would automate the copying by Macrat · · Score: 1

      You should contact Other World Computing to see if they would loan you an SSD drive.

      They are a small company and pay close attention to any issues with their products. I bet they would LOVE for you to stress their SSD products that quickly. :-)

      http://eshop.macsales.com/shop/SSD/OWC/

    11. Re:I would automate the copying by Osgeld · · Score: 1

      funny, I still have a 20 meg MFM drive with its original os installed on it

  7. CentOS, its enterprise class by perpenso · · Score: 1

    CentOS may be your best bet. Its Red Hat Enterprise Linux rebuilt from the Red Hat source code, minus the Red Hat trademark.

    1. Re:CentOS, its enterprise class by e3m4n · · Score: 3, Informative

      Scientific Linux is also a good option for similar reasons. Given its a science grant, they might like the idea that its used at labs like CERN

    2. Re:CentOS, its enterprise class by Anonymous Coward · · Score: 0

      What about Scientific Linux?

    3. Re:CentOS, its enterprise class by wytcld · · Score: 4, Insightful

      "Enterprise class" is a marketing slogan. In the real world, all the RH derivatives are pretty good (including Scientific Linux and Fedora as well as CentOS), and all the Debian derivatives are pretty good (including Ubuntu). Gentoo's solid too. "Enterprise class" doesn't mean much. The main thing that characterizes CentOS from Scientific Linux - which is also just a recompile of the RHEL code - is that the CentOS devs have "enterprise class" attitude. Meanwhile, RH's own devs are universally decent, humble people. Those who do less often thing more of themselves.

      For a great many uses, Debian's going to be easiest. But it depends on just what you need to run on it, as different distros do better with different packages, short of compiling from source yourself. No idea what the best solution is for the task here, but "CentOS" isn't by itself much of an answer.

      --
      "with their freedom lost all virtue lose" - Milton
    4. Re:CentOS, its enterprise class by NemoinSpace · · Score: 1
      I disagree. This guy:
      • doesn't like to read man pages
      • wants other people to tell him what buttons to push.

      Redhat, with a support contract is for him.

    5. Re:CentOS, its enterprise class by Anonymous Coward · · Score: 3, Insightful

      "Enterprise class" means that it runs the multi-million dollar crappy closed source software you bought to run on it without the vendor bugging out when you submit a support ticket.

    6. Re:CentOS, its enterprise class by perpenso · · Score: 1

      I disagree. This guy:

      • doesn't like to read man pages
      • wants other people to tell him what buttons to push.

      Redhat, with a support contract is for him.

      Well if he starts with CentOS that migration will be pretty simple.

    7. Re:CentOS, its enterprise class by Anonymous Coward · · Score: 0

      Bullshit. Enterprise class means that the vendor does proper testing and that my approved system will still boot after the update and will remain stable. Major patches are back ported to tested version so things are extremely stable and dependencies are not broken.
      Use enterprise class software AND hardware when you need your shit to work, always.

      > all the Debian derivatives are pretty good (including Ubuntu)

      WTF are you smoking. I want some!

    8. Re:CentOS, its enterprise class by StillAnonymous · · Score: 2

      Someone please mod this guy up.

      I swear, the higher the price of the software is, the more upper management just drools all over it, and the bigger the piece of shit it is. Millions of dollars spent per year licensing some of the biggest turds I've ever had the displeasure of dealing with. Just so management can say that some big vendor is behind it and will "have our backs when it fails".

      Guess what, the support is awful too. The vendor never has your back. You'll be left languishing with downtime while they leave you hanging. They don't care because they're so much bigger than your company, and the "license agreement" you signed means you can't hold them responsible for shit. You'd be so much better using some open source software that does pretty much the same thing, and paying some other company to support it on a per-case basis. Take the bags of money you save and hire some devs to code in the missing functionality you need.

    9. Re:CentOS, its enterprise class by CAIMLAS · · Score: 1

      Someone who loathes "enterprise" shit, here.

      Enterprise class is marketing bullshit, but it's not entirely without technical description. It means something much more concrete than what you think (though, granted, it's a fairly broad and variable definition).

      In short, Enterprise software is supposed to be "robust". AIX sysadmins like that term a lot, but it's nowhere near as stable or adaptable as say, Linux running under an amd64 hypervisor, in my experience.

      Enterprise software does not change wildly or unpredictably. It is (typically) fully backwards compatible at the expense of new features working properly on later software (see: IBM Service Desk). It may or may not perform well, but it meets managerial requirements of revision control, change control, auditability, and all those other things you infrequently need but managers like to cover their asses.

      Enterprise hardware, on the other hand, does seem to be fairly meaningless. In my experience, it mostly means that it's of limited production, limited warrantied compatibility and use case, and very fucking expensive. Performance can be all over the board (see: any of the RAID controllers IBM puts in their craptastic servers). Generally, it's supposed to mean "lasts a very long time and protects managers from liability for their poor decisions", though frequently only does the latter.

      If we're comparing Linux distributions on their "Enterprisey" nature, CentOS, RedHat, and maybe Debian are really the only ones which meet that burden. If you don't understand why Gentoo (or Ubuntu, or Slackware, or FreeBSD, or...) doesn't even approach Enterprise-grade, you've not been doing this very long. Sure, they can be made to behave in an Enterprise type fashion, but it's a massive undertaking.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
  8. unRaid by Anonymous Coward · · Score: 1, Informative

    unRaid FTW, I use this to handle TB's of data and it works fine.

    1. Re:unRaid by Anonymous Coward · · Score: 0

      I forgot to mention that it also has redundancy and shows up as a single share!

  9. Here we go again by Anonymous Coward · · Score: 3, Insightful

    Another "I don't know how to do my job, but will slag off OSS knowing someone will tell me what to do. Then I can claim to be l337 at work by pretending to know how to do my job".

    It's call reverse physiology, don't fall for it! Maybe shitdot will go back to its roots if no one comments in junk like this and the slashvertisments?

    1. Re:Here we go again by Anonymous Coward · · Score: 0

      Reverse physiology... the study of corpses and corpsification? I know Slashdot has gotten bad, but it's not THAT bad yet...

    2. Re:Here we go again by hot+soldering+iron · · Score: 1

      Damn, you outed yourself there as an old loser that never "made it" in your career field. "go back to its roots"? You would obviously have told the OP how to do this, and shown off how l337 your were, if you had the slightest clue of how to do this.

      Lame. Even as an AC troll.

      Go eat another twinkie and play with your star wars dolls in your mom's basement.

      --
      When you want something built, come see me. If you want correct grammar and spelling, get a F*ing liberal arts student.
  10. What Greyhole isn't by NemoinSpace · · Score: 4, Insightful
    • Enterprise-ready: Greyhole targets home users.

    Not sure why the 30s boot up requirement is there, so it depends on what you define as "booted" . Spinning up 12 hard drives and making them available through Samba within 30s guarantees your costs will be 10x more than they need to be.
    This isn't another example of my tax dollars at work is it?

    1. Re:What Greyhole isn't by nullchar · · Score: 1

      This isn't another example of my tax dollars at work is it?

      I hope not! Or my university tuition fees, or really any other spending, even other people's money.

      Who cares if the server boots up in 30 seconds or 30 minutes? The OP now has up to 12 500GB drives to either copy off or access over the lan. There's hours of data access or data transfer here.

    2. Re:What Greyhole isn't by fa2k · · Score: 1

      Could be an installation on Antarctica where they have to rely on solar power or diesel generators. IceCube? ( http://icecube.wisc.edu/ ). Though the boot will be a minor pwer draw compared to the data transfer, so it's unlikely or misunderstood.

    3. Re:What Greyhole isn't by atamido · · Score: 1

      • Enterprise-ready: Greyhole targets home users.

      I realize this quote is taken from the Greyhole home page, but it should be taken with a bit of salt. IIRC, the project has been around for years and has never lost any data. "The biggest one has 43TB, and uses 26 drives" isn't exactly big in an actual enterprise, but this isn't a 10k+ company, it's a small research group. The data they're talking about pretty small in comparison.

      That said, I'm not even sure how they'd use Greyhole to help solve their problem. The summary says they want to unify the directories of the drives that data is written to, but I would imagine it would be far simpler to just move the data off of the drives onto a larger array (which could use Greyhole).

      In the end, I suspect we just don't have enough information to understand what they are trying to do.

  11. Questionable by Anonymous Coward · · Score: 2, Informative

    Why would you want a file server to boot in 30 secs or less? Ok, lets skip the fs check, the controller checks, the driver checks, hell lets skip everything and boot to a recovery bash shell. Why would you not network these collection devices if they are all within a couple of miles and dump to an always on server?

    I really fail to see the advantage of a file server booting in under 30 seconds. Shouldn't you be able to hot swap drives?

    This really sounds like a bunch of kids trying to play server admin. My apologies if this is not the case, but given the parameters provided this IS what it sounds like.

  12. You don't need a union file system by Anonymous Coward · · Score: 1, Interesting

    There's no reason you need a union filesystem. Just mount the data at an appropriate point in a directory tree. Union file systems are designed to solve a different problem.

    What you boot from has nothing to do w/ what you read the data from.

    Samba is a really strange choice. Given the data volume I'd expect you to be using a large Linux cluster to process the data for which NFS would be more appropriate. It certainly sounds like microseismic data in which case the processing will benefit from making duplicate copies of the data and mounting read only via NFS so the first available server provides the data. Multiple ethernets are needed to get full benefit from doing that though.

    *nix documentation is actually very good. But there is a lot of it, so you tend to have grey hair by the time you've read all of it.

    BTW Does the CEO play guitar? I play harmonica.

  13. Nas4free + zfs by Anonymous Coward · · Score: 1

    Check out nas4free. It's basically freenas based on newer FreeBSD 9 which has zfs v28. I have been running it in a heavily used production system for 2 months with zero issues. I have 3 raidz2 setups that are shared out via NFs, cifs and afp. This setup is snapshotted every hour and also replicated via zfs send to offsides dr location.

    If you go this route invest in a ssd for Zil and one for l2arc if you decide to dedup.

  14. Openfiler by Anonymous Coward · · Score: 0

    I run 100TB storage using openfiler. It is particular about hardware but rock solid. Match with many nics it is wicked fast. Also I would spread you IO load across a few machines to increase transfer speeds.

  15. Distributed file system by Anonymous Coward · · Score: 0

    Have you considered using a distributed file system such as ceph?

    You will need more drives as the data takes twice the space, but on the other hand you won't need to worry about boot times or scalability anymore.

  16. FreeBSD by Anonymous Coward · · Score: 0

    I think there is no single good answer to this and most people will give their personal preference. I'm recommending FreeBSD because it can easily be tweaked and it also has a very good handbook http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/ (this is to answer your question about lack of documentation). FreeNAS actually is built on top of FreeBSD.

  17. Partly easy, partly not... by fuzzyfuzzyfungus · · Score: 2

    Booting in under 30 seconds is going to be a bit of a trick for anything servery. Even just putzing around in the BIOS can eat up most of that time(potentially some minutes if there is a lot of memory being self-tested, or if the system has a bunch of hairy option ROMs, as the SCSI/SAS/RAID-generally disk controllers commonly found in servers generally do...) If you really want fast, you just need to suck it up and get hot-swappable storage: even SATA supports that(well, some chipsets do, your mileage may vary, talk to your vendor and kidnap the vendor's children to ensure you get a straight answer, no warranty express or implied, etc.) and SAS damn well better, and supports SATA drives. That way, it doesn't matter how long the server takes to boot, you can just swap the disks in and either leave it running or set the BIOS wakeup schedule to have it start booting ten minutes before you expect to need it.

    Slightly classier would be using /dev/disk/by-label or by-UUID to assign a unique mountpoint for every drive sled that might come in from the field(ie. allowing you to easily tell which field unit the drive came from).

    If the files from each site are assured to have unique names, you could present them in a single mount directory with unionFS; but you probably don't want to find out what happens if every site spits out identically named FOO.log files, and(unless there is a painfully crippled tool somewhere else in the chain) having a directory per mountpoint shouldn't be terribly serious business.

    1. Re:Partly easy, partly not... by wvmarle · · Score: 1

      Thinking of what you just wrote I'd like to add a bit.

      First of all I don't think they will serve the data from those disks; if only because you will probably want yesterday's data available as well, and drives are constantly being swapped out. So upon plugging in a drive, I'd have a script copy the data to a different directory on your permanent storage (which of course must be sizeable to take several times 500 GB a day - he says each sensor produces 500 GB of data - so several TB of data a day, hundreds of TB a year).

      Data files may be renamed to some unique name, if necessary, and a date may be added to either the file name, or create a subdir where the data of that day lives in. You may also want to consider to compress this data on your permanent storage, at least by the time it's becoming archive and doesn't need to be accessed much if at all any more.

      And considering the hundreds of TB of data you're going to collect - I am assuming this is some kind of long-term data collection - I wonder what you're doing asking around web sites, and not buy a complete solution from a vendor that's experienced in handling this kind of datasets.

      A quick look on Wikipedia tells me that the largest disks available nowadays are 4 TB, that's a day or two of data. For a year of data, you'll going to need a small cupboard full of them. Tape storage comes to mind, too. The amounts of data you're talking about are massive.

    2. Re:Partly easy, partly not... by Anonymous Coward · · Score: 0

      I'm running an Intel server board with dualx4 core xeons, 64 gb ecc fbdimms with 4x4 sas cards. Basically the ram tests take more than 30secs and to init all the hardware takes even longer.

      No way 30. Fail.

    3. Re:Partly easy, partly not... by fuzzywig · · Score: 1

      Compression. A string of data from a sensor is probably going to be pretty compressible, and might easily bring those data sizes down by 10x.

  18. Openfiler by Cluelessthanzero · · Score: 0

    You might want to consider [url=http://www.openfiler.com]openfiler[/url] too.

  19. Your boss has no idea what he is doing by NemoinSpace · · Score: 1

    But he either likes you, or is setting you up. build one of these instead.: http://hardware.slashdot.org/story/11/07/21/143236/build-your-own-135tb-raid6-storage-pod-for-7384 It's already been talked about.
    I know you already stated the hardware is already in place. This is about exercising your new found authority. Go big or go home.

    1. Re:Your boss has no idea what he is doing by Anonymous Coward · · Score: 0

      Yeah, RAID6...wait no. Not at all.

      I am seriously tired of people recommending raid5 variants for write heavy environments. Performance can suffer dramatically.

  20. ZFS Filesystem will help by Anonymous Coward · · Score: 4, Insightful

    500G in a 24h period sounds like it will be highly compressible data. I would recommend FreeBSD or Ubuntu with ZFS Native Stable installed. ZFS will allow you to create a very nice tree with each folder set to a custom compression level if necessary. (Don't use dedup) You can put one SSD in as a cache drive to accelerate the shared folders speed. I imagine there would be an issue with restoring the data to magnetic while people are trying to read off the SMB share. An SSD cache or SSD ZIL drive for ZFS can help a lot with that.

    Some nagging questions though.
    How long are you intending on storing this data? How many sensors are collecting data? Because even with 12 drive bay slots, assuming cheap SATA of 3TB a piece. (36TB total storage with no redundancy), lets say 5 sensors, thats 2.5TB a day data collection, and assuming good compression of 3x, 833GB a day. You will fill up that storage in just 43 days.

    I think this project needs to be re-thought. Either you need a much bigger storage array, or data needs to be discarded very quickly. If the data will be discarded quickly, then you really need to think about more disk arrays so you can use ZFS to partition the data in such a way that each SMB share can be on its own set of drives so as to not head thrash and interfere with someone else who is "discarding" or reading data.

    1. Re:ZFS Filesystem will help by burni2 · · Score: 1

      Agree, please someone may mod this up

      I would add: LTO-5 drive, for backup

    2. Re:ZFS Filesystem will help by Fallen+Kell · · Score: 2

      500G in a 24h period sounds like it will be highly compressible data

      It sounds like audio and video to me which is not very compressible at all if you need to maintain audio and video quality. And good luck booting a system with that many drives in "under 30 seconds" especially on a ZFS system which needs a lot of RAM (assuming you are following industry standards of 1GB RAM per 1TB of data you are hosting) as you will never make it through RAM integrity testing during POST in under 30 seconds.

      --
      We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
  21. Pogoplug by Tsiangkun · · Score: 1

    You are in over your head. Buy a pogoplug and some usb2 hubs. Connect your drives to the hubs and they appear as unified file system on your clients. Or, if you need better performance accessing the data, call an expert.

  22. You could also just use symbolic links. by DamnStupidElf · · Score: 3, Insightful

    Unless you're talking about millions of individual files on each drive it should be relatively quick to mount each hard drive and set up symbolic links in one shared directory to the files on each of the mounted drives. Just make sure Samba has "follow symlinks" set to yes and the Windows clients will see just see normal files in the shared directory.

    1. Re:You could also just use symbolic links. by Anonymous Coward · · Score: 0

      That's kind of silly when things like UnionFS already exist in Linux and *BSD.

  23. This whole project is a joke/fishing trip by Anonymous Coward · · Score: 1

    this whole project is a joke, either:
    A) you don't know what your looking for(reasonable but silly)
    or B) you don't know how to collect data of what your looking for (pointless).

    Process your sensor data at the sensor. There is no reason that anyone needs to take more than 10-50 mb of data per sensor device per session.

    If it's a signal detection, use a smart filter to capture the event, or a FFT to capture the frequencies.
    If it's a measurement, use a buffered slope detection and only capture the change.

    If you do need to move 500gb/per day/per sensor. Just install fiber to the sensors and stream it back to a localized collection server.
    This saves countless sneaker net headach's, compression issues and the sort. For the collection server Buy It!, there are great products out there that can
    take your 500gb of poorly compressed sensor data and make it 500mb of indexed intelligence (totally avoiding the obvious buzz word use here sorry /.).

  24. Silly workplace security policies by cowboy76Spain · · Score: 1

    Aka: I do not want the insecurity of losing my workplace if my boss happens to learn in slashdot how clueless I am.

    Seriously... could you send us the resumé that you sent to get that job?

    --
    Why can't /. have a rich-text editor? Editing your own HTML is so XXth century.
  25. Arch Linux by Anonymous Coward · · Score: 0

    Arch, with conservative repo selection, is the best server OS I've ever used.

  26. Use Linux and Call it good by adosch · · Score: 3, Interesting

    Why is documentation for *nix always so bad?""

    For starters, I'm really tired of this /. *NIX is-too-hard ranting all the time on 'Ask Slashdot' posts. Don't be a n00b douche; if you don't get it, then spend some time and get it. Don't blame the documentation; dig in and figure out something for yourself for once. Sometimes you Nintendo-and-Mt-Dew generation make me want to throw up.

    As for your solution, do-not go with some installable appliance-type distro like FreeNAS; yes it's *BSD under the hood, but you're at the mercy of what that 'focused' distro is goign to provide for you. Case in point: since you're undecided, go with a full-blown distro so you have some flexibility to grow and augment the mission and purpose of this server you're hosting data on.

    Since you're clearly a n00b since it's coming to picking out a *NIX solution, go with anything Linux at this point, and set up the NAS services yourself (e.g. Samba/SMB, NFS, etc.) In turn, you'll be able to go to get better community support helping you out, you'll have more flexible OS configuration and growth, and you'll probably learn something to boot.

    Also, you don't need to do union filesystem. Simple udev rules and auto mounting them under your top-level structure you're sharing out with your NAS services will do you just fine.

    1. Re:Use Linux and Call it good by Anonymous Coward · · Score: 0

      Dude. I think who sounds like a "fag" in any case would be....yes, you.

    2. Re:Use Linux and Call it good by Anonymous Coward · · Score: 0

      I used to think linux documentation was bad, but then I realized why: they assume you actually know something because they are written for a general audience. I learned linux through 3 major components. The first was experimentation (what if I try changing just this one setting?), using the man pages in conjunction with googling unknown terms (WTH do they mean by an 'ordered' journal?) and asking dedicated communities (linuxquestions.org for general things, distro site for specific distro questions and program websites for specific programs).

    3. Re:Use Linux and Call it good by Anonymous Coward · · Score: 0

      You, yes you, are a classic example of everything that is wrong with the open source "community".

  27. Is this a troll? by Anonymous Coward · · Score: 0

    Firstly, the troll bit: "Why is documentation for *nix always so bad?" Scuse me? Uni systems have the BEST documentation of ANY current OS's. Why do you say the documentation is poor?
    Secondly: If you have to ask slashdot for this you should not be doing this. Get someone who knows what they are doing to do it for you.

  28. Hadoop? by Anonymous Coward · · Score: 0

    Have you read about Hadoop? I'm not altogether sure it fits what you're doing precisely, but depending on how the data will be used and your fault tolerance characteristics, it might be a good fit.

  29. waaaay over head by itzdandy · · Score: 4, Insightful

    What is the point of 30 second boot on a file server? If this is on the list of 'requirements', then the 'plan' is 1/4 baked. 1/2 baked for buying hardware without a plan, then 1/2 again for not having a clue.

    unioning filesystem? what is the use scenario? how about automounting the drives on hot-plug and sharing the /mnt directory?

    Now, 500GB/day in 12 drive sleds....so 6TB a day? do the workers get a fresh drive each day or is the data only available for a few hours before it gets sent back out or are they rotated? I suspect that mounting these drives for sharing really isnt what is necessary, more like pull contents to 'local' storage. Then, why talk about unioning at all, just put the contents of each drive in a separate folder.

    Is the data 100% new each day? Are you really storing 6TB a day from a sensor network? 120TB+ a month?

    Are you really transporting 500GB of data by hand to local storage and expecting the disks to last? reading or writing 500GB isn't a problem, but constant power cycling and then physically moving/shaking the drives around each day to transport is going to put the MTBF of these drives in months not years.

    dumb

    1. Re:waaaay over head by gweihir · · Score: 1

      I agree. "30 seconds boot time" is a very special and were un-server-like requirement. It is hard to achieve and typically not needed. Hence it points to a messed-up requirements analysis (or some stupid manager/professor having put that in there without any clue what it implies). This requirement alone may break everything else or make it massively less capable of doing its job.

      What I don't get is why people without advanced feel they have any business setting up very special and advances system configurations. IT is hard, requires real knowledge and skills (you cant talk it over with the system, it will not work if you mess up) and anything non-standard makes it that much harder and hence requires very good justification.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    2. Re:waaaay over head by Anonymous Coward · · Score: 0

      They generally get stuck with it. Most people would rather fight like hell with something they don't understand and risk losing their job when it all goes wrong than tell their boss they aren't qualified do it and definitely lose their job. Doesn't make them bad people, it's the boss who should have more of a clue about what he's asking and what skillset / person he needs to use to get it done. Having worked in capability requirements for some years, 30 seconds definitely sounds like someone pulled it out of their ass. Why not 35? Or 25? The OP says in another reply that it's a quirky project and the servers only get turned on for short periods. Still 30 seconds sounds pulled-out-of-the-ass. It's a reasonable target to achieve on a very simple desktop or laptop with a single drive, no funky boot ROMs etc. but not with that much storage. Does it count from power on or from when the BIOS hands off to the operating system? If the former, how much time do you actually have to boot once the BIOS has finished its business? Fundamentally you may be attempting something impossible, but why would a 5 minute (or 10 minute) boot time not be sufficient? You only have to do it once a day, right, so put all the drives in, switch on and go for a coffee. Or is there some actual requirement in terms of data turnaround - you've got so long to bring them in from the field, get the data off and get them out again - and this has somehow been turned into a 30 second boot requirement. Anyhow, without the ability to either change the boot time requirement or buy at least some new hardware I think the task may be impossible. Without understanding more details of the *whole* project (is it classified or something, you seem reluctant to give details) I wouldn't like to suggest any particular solution. Unless you have funds to buy new hard drives as the initial batch fail I'd be nervous about transporting all that data by hard drive each day - and if you have funds to replace the hard drives you may well find it better value for money to think through your requirements properly and spend your money on a different set of hardware.

  30. Re:"Why is documentation for *nix always so bad? by Knuckles · · Score: 3, Insightful

    Saying "only good mp3 player" makes no sense unless you specify your criteria. Amarok, Banshee, VLC, Rhythmbox, or smplayer are all capable mp3 players by various criteria and easily found by googling for "linux mp3 player". If you use Ubuntu, searching for mp3 player in Software Center finds a plethora of good players. Googling "list of linux audio software" easily finds other things besides just mp3 players: maybe something like Audacity satisfies your requirements better. Search for "mp3" on xmms2.org finds the answer in the first link - your xmms2 install needs have the MAD library, maybe your distro does not install that.

    Does not seem like the problem is with bad docs.

    --
    "When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
  31. Unifed filesystem is a crutch by gweihir · · Score: 1

    Use systems of symbolic links.

    Also, why "30 seconds boot time"? This strikes me as a bizarre and unnecessary requirement. Are you sure you have done careful requirements analysis and engineering?

    As to the "bad" documentation: Many things are similar or the same on many *nix systems. This is not Windows where MS feels the need to change everything every few years. On *nix you are reasonably expected to know.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  32. Just use "mount -o bind" by Anonymous Coward · · Score: 0

    It seems to me that the easiest way is to mount all of the sensor HDDs to a single directory (using ordinary "mount") on the OS and then share that directory over the network.

    Even easier would be to write a shell script to handle the mounting & sharing so that all you have to do is connect the drives and then execute that script.

  33. cognitive dissonance... by Anonymous Coward · · Score: 1

    "When these drives come back from the field each day, they'll be plugged into a server featuring a dozen removable drive sleds."

    "There's also requirement that the server has to boot in 30 seconds or less off a mechanical hard drive."

    So... it takes minutes to hours to get your drives to the server, but then suddenly it's an emergency to get the server booted. That makes no sense to me. Please explain.

  34. Suggest you change your approach! by Anonymous Coward · · Score: 0

    You say "a couple of square miles of undeveloped land". You also seems to say that you have around 12 sensor stations/ portable hard drives that are going to be swapped in and out each day. For reasons stated by others this is asking for loosing data. I would think you can't afford to do that.

    Why not set up a wi-fi network using wifi links such as ubiquiti nanostation. You can easily cover a a couple of square miles with that sort of equipment; unless we are up in some mountains or in terrain where you are not even close to obtaining line of sight between stations. You local hardware store might be able to help with aluminum mast tubes. Aim to have the nanostations as far away as possible from any nearby objects including trees.

    Then use rsync https://en.wikipedia.org/wiki/Rsync to continuously copy your valuable sensor data to your central location equipped with a safe and daily backed up RAID system. With 6 TB of data per day you are in for some data management issues unless you can somehow weed out irrelevant data on the go. And another plus is that you can work on new sensor data all day.

  35. Please do my work for me by Anonymous Coward · · Score: 1

    Because I suck at it and I'm too lazy to learn how to do it myself.

  36. external as an option by Anonymous Coward · · Score: 0

    Real servers, even file servers, don't get booted often. And when they do, they should take as much time as they need to check the things they need to check.

    Suggest you consider a hot-swap option, such as external drives on USB3.

    Mount the external drives into some folder using some kind of naming scheme that makes sense. And/Or copy the data into a nice redundant file system, like RAID6 or RAIDZ.

    You will want to speak with a professional storage provider about this, as you are in the special needs category.

  37. OP doesn't have a clue by Anonymous Coward · · Score: 0

    But he think he has. Disastrous.

    Hey. Could we get a bit of a better screening for what Ask Slashdot stories get to FP??? This sucks big time.

  38. Blah by Anonymous Coward · · Score: 0

    As many have noted, your system is junk if it boots in 30 seconds as server grade hardware typically contain a series of initilization steps and validations that blow away 30 seconds.Initializing 12 disks will take equally as long on a cold boot. Anyhow, stop cold booting and learn to use mount.

    The unified file system is rather generically phrased and doesn't describe the task adequately enough. Do the files on the disks contain static names? If so, symlink farm from your mount points.

    If the files on the hard disks contain random giberrish for names then simply index the files using a scripting language and create those symlinks. There are a number of ways to do this without actually having to execute the script. The less interesting method is to create a cron job that would clean up stale links and create new ones. Slightly more interesting would be to use the hot plug facility to create the index. Magically, the latter method which depends on hot plug would work well with actually avoiding the cold boot altogether.

    I'm afraid this problem wouldn't even take an afternoon to solve.

    Typically, the "bad documentation" experience is more accurately defined as "I don't know what I'm looking for." Really, once I have a few key words for a subject it's rather easy to find documentation for a particular area.

  39. OP here by Anonymous Coward · · Score: 5, Informative

    Ok, lots of folks asking similar questions. In order to keep the submission word count down I left out a lot of info. I *thought* most of it would be obvious, but I guess not.

    Notes, in no particular order:

    - The server was sourced from a now-defunct project with similar setup. It's a custom box with non-normal design. We don't have authorization to buy more hardware. That's not a big deal because what we have already *should* be perfectly fine.

    - People keep harping on the 30 seconds thing.
    The system is already configured to spin up all the drives simultaneously (yes the PSU can handle that) and get through the bios all in a few seconds. I *know* you can configure most any distro to be fast, the question is how much fuss it takes to get it that way. Honestly I threw that in there as an aside, not thinking this would blow up into some huge debate. All I'm looking for are pointers along the lines of "yeah distro FOO is bloated by default, but it's not as bad as it looks because you can just use the BAR utility to turn most of that off". We have a handful of systems running winXP and linux already that boot in under 30, this isn't a big deal.

    - The drives in question have a nearly identical directory structure but with globally-unique file names. We want to merge the trees because it's easier for people to deal with than dozens of identical trees. There are plenty of packages that can do this, I'm looking for a distro where I can set it up with minimal fuss (ie: apt-get or equivalent, as opposed to manual code editing and recompiling).

    - The share doesn't have to be samba, it just needs to be easily accessible from windows/macs without installing extra software on them.

    - No, I'm not an idiot or derpy student. I'm a sysadmin with 20 years experience (I'm aware that doesn't necessarily prove anything). I'm leaving out a lot of detail because most of it is stupid office bureaucracy and politics I can't do anything about. I'm not one of those people who intentionally makes things more complicated than they need to be as some form of job security. I believe in doing things the "right" way so those who come after me have a chance at keeping the system running. I'm trying to stick to standards when possible, as opposed to creating a monster involving homegrown shell scripts.

    1. Re:OP here by Anonymous Coward · · Score: 0

      sudo apt-get install Debian

      You're welcome.

    2. Re:OP here by TheSunborn · · Score: 1

      The boottime problem can be solved simely by buying a small(32/64GB) ssd disk, and then install linux on that. This will cost you less then 200$. And if you don't have the budget for that, then please your story to thedailywtf.com because it sounds like some interesting fuckup happend someware in your organization.

    3. Re:OP here by Anonymous Coward · · Score: 0

      I'm a sysadmin with 20 years experience

      In all honesty, you don't look like one, at all.

    4. Re:OP here by Anonymous Coward · · Score: 1

      All I'm looking for are pointers along the lines of "yeah distro FOO is bloated by default, but it's not as bad as it looks because you can just use the BAR utility to turn most of that off".

      Install a new Debian OS by downloading the current netinst cd image. That only contains what is needed to be able to install software, the rest will be downloaded when you install it. When installing from the netinst cd do NOT choose one of the desktop or server configurations they offer. That results in an installation so minimal nearly everything you take for granted is missing, you can boot it and install software and that's pretty much it. Build from there and install only what you will actually use. That should result in minimal boot times. You will go through a phase where almost everything you need to work comfortably is missing and needs to be installed. But if you're familiar with working on a Linux/*nix system and know what you're missing that's not difficult: apt-get install <package> where <package> quite often is the name of the command you're missing and on packages.debian.org you can find which package a command is part of otherwise. If you don't know your way around a Linux CLI already you're in for a learning experience ;-).

      I expect other distributions will offer a similar workflow, but it's been a very long time since I've used anything other than Debian so I wouldn't know. I suspect that end user oriented distributions like Ubuntu may not support this way of building a system, they only seem to offer full cd's or dvd's, but I may be wrong about that.

    5. Re:OP here by Anonymous Coward · · Score: 0

      "I'm a sysadmin with 20 years experience"

      No, you aren't or else you wouldn't express yourself the way you do.

      My bet? You have some experience as a windows operator, quite a different thing.

    6. Re:OP here by Anonymous Coward · · Score: 0

      Then I'd say to use Fedora, and use the earlier tips given for mounting each drive separately under /mnt. Then use a simple bash to create symlinks in a master directory structure which is shared. Should be portable and maintainable enough.

    7. Re:OP here by Anonymous Coward · · Score: 0

      ureadahead says a HDD is just fine for a quick boot. OP should look into it, Ubuntu has it by default but in Debian it's probably a mere apt-get away. It also might be possible to use systemd but that's likely to work better under the latest Fedora atm.

    8. Re:OP here by Anonymous Coward · · Score: 0

      I love it when the OP comes out to defend and assimilate details of their "Ask Slashdot" post...

      Uh, if you're a so-called sysadmin with "20 years" of experience, then just do us all a favor and take all the bloat and lies off your resume and quit because clearly you suck at your job. And I am dead serious; I have junior/n00b SA's on my team that clearly have their shit more together than you do. And no, they don't lurk to slashdot for technical solutions to things they get paid to solve. Unbelievable.

    9. Re:OP here by Richard_J_N · · Score: 2

      My suggestion would be:

      0. Do consider writing this yourself...a 100 line shell-script (carefully written and documented) may well be easier to debug than a complex off-the-shelf system.

      1. You can easily identify the disks with eg /dev/disk/by-uuid/ which, combined with some udev rules, will make it easy to identify which filesystem is which, even if the disks are put into different caddys. [Note that all SATA drives are hot-swap/hot-plug capable: remember to unmount, and to connect power before data; disconnect data before power; observe precautions with static, with cable-lengths, and don't break the connectors (which have limited life-cycles) ] You shouldn't need to reboot.

      2. Consider just trying the trees together with symlinks (Use "find" to recurse, then ln -s"). Unless you have many tens of thousands of small files, this will work remarkably well, especially if the disk that holds the symlinks is fast and has a sensible filesystem; you could even make it a ramdisk.
      [Personally, I've been bitten by unionfs systems, and I'm a little wary of them. They break in weird ways]

      3. Unless you have a good reason to do otherwise, install the latest Ubuntu LTS release (Precise Pangolin 64). I recommend you install {L,X}ubuntu as a base, rather than the pure-server edition or the full gnome/kde systems, because it gives you a minimal environment, but with X installed: no unwanted services, but you don't have to begin from a 80x25 terminal! You can just "apt-get install ubuntu-desktop" to convert to the full version if you need to.

    10. Re:OP here by Brianwa · · Score: 1

      2. Consider just trying the trees together with symlinks (Use "find" to recurse, then ln -s"). Unless you have many tens of thousands of small files, this will work remarkably well, especially if the disk that holds the symlinks is fast and has a sensible filesystem; you could even make it a ramdisk.

      This seems like the most reasonable solution to me. I think people are getting caught up treating this like a high availability fileserver when it's really just a data acquisition project. Configure the disks to automatically mount, and then use a really simple condition to figure out which mounted disks have data on them (for example, the existence of a directory, or even just the size of the disk). Use a shell script to test this condition and then make symlinks for all of the data files.

      I don't know exactly what kind of equipment OP is working with, but some DAQ systems let you choose what size of files to divide the output into. Try choosing the largest reasonable file size to reduce the number of symlinks.

      If you really think duplicate file names are unlikely then simply don't worry about them. I would at least have the script make some sort of log so you can figure out WTF happened if you find yourself missing some data. Don't worry about security -- this is a scientific project to it's safe to assume that the root password and IP address are written in sharpie on the server anyway, probably within eyesight of a window that faces a busy street. Don't listen to the people suggesting wireless telemetry instead of sneakernet, you have at least an order of magnitude more data than would make sense for such a system.

    11. Re:OP here by Anonymous Coward · · Score: 0

      As far as how to get all the files to appear in a single directory, just mount the sleds using UUID so that they appear in standard directories like disk1 disk2...
      Then make a directory on the server (not on the sled disks) that will be the landing zone for soft links to the files that you have on the sled drives.
      mkdir /wherever/landing_dir
      cd /wherever/landing_dir
      for i in `find /sled_dir/disk?/dirname/dirname -type f`
      do
                ln -s $i `basename $i`
      done

      Now you have all your files in a single directory... share out using whatever method you wish.

      Soft links are going to add a file op to opening the files from the sled, but might not be a problem if you have large files.

  40. "Wow I completely agree with an AC" by Anonymous Coward · · Score: 0

    Says somebody hiding behind his alias "LodCrappo" and trolling the AC's for a flame war.

    1. Re:"Wow I completely agree with an AC" by Anonymous Coward · · Score: 0

      Prrrrrrrt.

    2. Re:"Wow I completely agree with an AC" by Anonymous Coward · · Score: 0

      WTF are you talking about. Insane much?

  41. Re:"Why is documentation for *nix always so bad? by b4dc0d3r · · Score: 1

    OP rather clearly stated criteria for "good mp3 player". Here it is, since you missed it the first time: "sorts music like XMMS and since I'm used to XMMS does most everything in a similar fashion as well."

  42. fast boot isn't unreasonable by Anonymous Coward · · Score: 0

    If you actually have to boot fast, stay away from any distro that isn't friendly about bare bones installs. Yes, that means ubuntu and redhat among many others. You seem to have a very specific use model for this machine, so run only what you want and don't let the distro bully you into running more.

    That said, as many other people have commented, cold booting seems pretty pointless. Unless you are really stuck with pata drives, you will almost certainly be using a hot-swap friendly bus (sata, sas, usb, firewire). If you must power down the server, you can still suspend or hibernate.

    Unionfs should be supported by pretty much everything. FreeNAS and other special purpose distros will probably make it really easy, if this is the sort of use model envisioned by the maintainers, or more difficult if its not. So if the docs don't make it look easy, I'd suggest avoiding those and sticking with an environment where its easy to roll your own features.

    Personally, I would probably use debian with systemd (I like it a lot more than upstart), or freebsd for something like this. Slackware and gentoo are also nicely minimalistic. If boot time matters, seriously, use a SSD for the os, and keep the spinny disks just for data. If you keep the services light, you should have no trouble getting any of those to boot in 10 seconds (after bios does its stuff).

    2.5" notebook hard drives (also sold as 2.5" externals), will probably hold up better than desktop or even enterprise drives, if you're going to bounce them around every day. As people mentioned, don't expect them to last forever. I would suspect you will mostly be writing sequentially to the drives, that should help you get a bit more life out of them. Run the drives' self tests frequently and toss them as soon as you start seeing problems crop up. 1-1.5TB 2.5" external drives are reasonably cheap at this point.

  43. First thing I thought of... by bjwest · · Score: 1

    The first thing I thought of was loss of one of the drives during all this moving around. Seems the protection of the data would be of the utmost priority here. Keeping this in mind, I'd go with a RAID 5 or 10 setup. This will eliminate having the data distributed on different "drives", so to speak, and it would appear to the system one single drive. This would increase the drive count, but loosing a drive, ether physically (oops, dropped that one in the puddle) or electronically (oops, this drive crashed because we keep swapping it every day) would be a non-issue, or at least a not-tragic issue. I'm sure you have a swappable tray system now for the number of drives you need, you may need to add a tray or two for this setup. Just make sure you keep the drives in the correct order, or swap out the whole drive unit.

    As for the original question. I don't think there's really a "best" distro for this, they'll all pretty much do the above out of the box and almost automagically. What you need to look for is what is the easiest distro to use in this case. What will the users be able to use with the least support from me? Unless you're the one that will be swapping out the drives on a daily basis, then use what you're most comfortable with.

    --

    --- Keep the choice with the user..
  44. Eventual migration to RHEL by perpenso · · Score: 1

    The nice thing about CentOS is that if/when you wind up on RHEL (comes with hardware, what you hosting provider is using, etc) the migration will be pretty simple.

  45. Re:"Why is documentation for *nix always so bad? by Knuckles · · Score: 1

    First of all, xmms seems to be really outdated, and if his distro does not include it and he cannot compile it himself for whatever reason (despit copious info for how to install dev packages and compile for any distro), I fail to see how this is a failure of the un*x docs specifically. Current documentation for WinPlay3 is also rather scarce.

    It's also hard to believe that there is not a single mp3 player out there that sorts music like xmms, whatever this is. He stated he tried Rhythmbox and was not too happy, but the allegedly so deficient un*x docs readily list many more, while the Ubuntu software center lets you try them out with one click.

    Anyway, the logical conclusion seems to be to use xmms2, docs for which appear in the third google hit (for me) when searching for xmms. Which in turn, as I wrote, would take him to the solution in the first hit when searching for mp3 at xmms2.org. Again, how is this evidence for generally bad docs?

    --
    "When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
  46. Not gonna happen. by Anonymous Coward · · Score: 5, Insightful

    You have to be able to identify the disks being mounted. Since these are hot swappable, they will not be automatically identifiable.

    Also note, not all disks spin up at the same speed. Disks made for desktops are not reliable either - though they tend to spin up faster. Server disks might take 5 seconds before they are failed. You also seem to have forgotten that even with all disks spun up, each must be read (one at a time) for them to be mounted.

    Hot swap disks are not something automatically mounted unless they are known ahead of time - which means they have to have suitable identification.

    UnionFS is not what you want. That isn't what it was designed for. Unionfs only has one drive that can be written to - the top one in the list. Operations on the other disks force it to copy it to the top disk for any modifications. Deletes don't happen to any but the top disk.

    Some of what you discribe is called an HSM (hierarchical storage management), and requires a multi-level archive where some volumes may be on line, others off line, yet others in between. Boots are NOT fast, mostly due to the need to validate the archive first.

    Back to the unreliability of things - if even one disk has a problem, your union filesystem will freeze - and not nicely either. The first access to a file that is inaccessable will cause a lock on the directory. That lock will lock all users out of that directory (they go into an infinite wait). Eventually, the locks accumulate to include the parent directory... which then locks all leaf directories under it. This propagates to the top level when the entire system freezes - along with all the clients. This freezing nature is one of the things that a HSM handles MUCH better. A detected media error causes the access to abort, and that releases the associated locks. If the union filesystem detects the error, then the entire filesystem goes down the tubes, not just one file on one disk.

    Another problem is going to be processing the data - I/O rates are not good going through a union filesystem yet. Even though UnionFS is pretty good at it, expect the I/O rate to be 10% to 20% less than maximum. Now client I/O has to go through a network connection, so that may make it bearable. But trying to process multiple 300 GB data sets in one day is not likely to happen.

    Another issue you have ignored is the original format of the data. You imply that the filesystem on the server will just "mount the disk" and use the filesystem as created/used by the sensor. This is not likely to happen - trying to do so invites multiple failures; it also means no users of the filesystem while it is getting mounted. You would do better to have a server disk farm that you copy the data to before processing. That way you get to handle the failures without affecting anyone that may be processing data, AND you don't have to stop everyone working just to reboot. You will also find that local copy rates will be more than double what the servers client systems can read anyway.

    As others have mentioned, using gluster file system to accumulate the data allows multiple systems to contribute to the global, uniform, filesystem - but it does not allow for plugging in/out disks with predefined formats. It has a very high data throughput though (due to the distributed nature of the filesystem), and would allow many systems to be copying data into the filesystem without interference.

    As for experience - I've managed filesystems with up to about 400TB in the past. Errors are NOT fun as they can take several days to recover from.

    1. Re:Not gonna happen. by olau · · Score: 1

      If UnionFS is not the answer, and OP doesn't want something more complicated, it honestly sounds like the simplest solution is to just symlink everything into another directory once it's up running.

      You have to do something about failures, but if you have that in hand, making the symlinks is really easy, you can do it with three lines of bash (for each path in find -type f, create symlink). If you need a bit more control, and it's not easily doable with grep, I'd write a small Python program, you can google up the necessary directory recursion code easily enough.

      This would probably be much easier to understand than some auto-magic FS stuff. It means you will have to read the entire directory structure, but if you going to do that anyway, pulling it into the disk cache at boot is perhaps not a bad idea? If you start the script once for each disk, you can do the reading in parallel.

      If that's fine, all you need is a distribution with the remote mount stuff set up. It's not too hard to do on Debian, but you will have to edit a configuration file or two.

  47. Re:"Why is documentation for *nix always so bad? by Knuckles · · Score: 1

    Also, "my criteria is for x to work like ancient app y" is not so workable. Sounds like Microsoft's convoluted standard's document for Office Open XML regarding backward compatibility. "You have to emulate bug x of Word 2, but we can't tell you exactly how that worked". Someone might have helped him if he had given specific requirements.

    --
    "When I first heard Daydream Nation it quite frankly scared the living shit out of me." -- Matthew Stearns
  48. Not everything needs to be developed in house... by Bork · · Score: 1

    One of the things I have learned in my career is that I do not know everything. I do the things I can do well and understand the business and when I need something more, I will call in those that have the skills to help. If you get the right consultant involved with the project, they will bring the knowledge necessary to do the job right.

    One of the biggest mistakes you can make is to think that you have to know everything and its some kind of fault to say, “I do not know right now but let me look into it and come back with an answer”. Do not limit yourself to only the knowledge and skills of the in house staff, tap into other sources to bring in new knowledge and skills that can help you solve problems.

    My biggest resource was a list of those I could call in to help me with an issue or in the help in locating those that could come up with an answer. It not a fault to say I need to bring in some help on this project.

    ---
    Excuse the grammar, been awake a few to many hours and not thinking to clear right now.

  49. If you want good documentation... by pigiron · · Score: 2

    then get OpenBSD.

  50. What Sensors? by Anonymous Coward · · Score: 1

    Well OP, you opened a big can of worms with this one. You may as well dump everything on the table. I see you started to in another post already. Why not add on answers to the other questions like, what kind of "sensors" are producing 500GB per day? What happens on undeveloped land that could produce so much data?

    My first guess (considering undeveloped land) is weather plus solar radiation and soil temperature/moisture and perhaps seismic data. But that couldn't take more that 5MB per day. So what produces 500GB per day?

  51. area 51? by Anonymous Coward · · Score: 0

    Come on admit it ... area 51 exists

  52. Dear sir, by Anonymous Coward · · Score: 0

    Documentation for "*nix" is not "always so bad". You seem to be confusing the land of linux with the land of all (free and non-free) unices. In fact, the killer thing about unix always was its frankness about its limitations (qv "bugs" and "limitations" sections in manpages). "linux" or actually the gn00 bunch "standardised" themselves on their own texinfo format, complete with a captive viewer that only an emacs-luser could love. But I digress. There are unix-like operating systems Out There that are rather better documented, but they don't come in "distros". I could mention them but I'll leave looking around a bit more as an exercise to you.

    Next up, you're in a supposedly scientific field but you're insisting on samba. What gives? While that software is trying its level best to make bricks without straw, the only reason you'd use it is because the clients use sucky software by an uncooperative vendor. Meaning you're going to run into performance problems no matter what you do. There are better solutions available and widely used in the academic world. Time to look into that.

    Then, well, why are you so hung up on overly complex "solutions"? What about simply mounting the disks according to the day the data is from, maybe a bit of shell scripting to glue it all together? Or maybe it's more efficient to pump 500GB once to where it can be shared and archived conveniently then pull the sled and send the drive back into the field. I don't know. Have you even tried?

    In short, dear sir, your priorities are jumbled and you haven't done your homework, then you insist on blaming the software and moaning about documentation. The easy solution to all that is to shut up and hire someone competent, even if only to consult on a suitable architecture.

    Of course you don't have budget for that, because there's never budget to do it right. But somehow there's always budget to do it all over again, and fuck that up too because, well, no budget to do it right. "Ask slashdot" is in fact a symptom of that. Go sort your priorities.

  53. Ask the correct community : science informatics by oneiros27 · · Score: 4, Informative

    What you're describing sounds like a fairly typical Sensor Net (or Sensor Web) to me, maybe with a little more data logged than is normal per platform. (I believe they call it a 'mote' in that community).

    Some of the newer sensor nets use a forwarding mesh wireless system, so that you relay the data to a highly reduced number of collection points -- which might keep you from having to deal with the collection of the hard drives each night (maybe swap out a multi-TB RAID at each collection point each night instead).

    I'm not 100% sure of what the correct forum is for discussion of sensor/platform design. I know they have presentations in the ESSI (Earth and Space Science Informatics) focus group of the AGU (American Geophysical Union). Many of the members of ESIPfed (Federation of Earth Science Information Partners) probably have experience in these issues, but it's more about discussing managing the data after it comes out of the field.

    On the off chance that someone's already written software to do 90% of what you're looking for, I'd try contacting the folks from the Software Reuse Working Group of the Earth Science Data System community.

    You might also try looking through past projects funded through NASA AISR (Adanced Information Systems Research) ... they funded better sensor design & data distribution systems. (unfortunately, they haven't been funded for a few years ... and I'm having problems accessing their website right now). Or I might be confusing it with the similar AIST (Adanced Information Systems Technology), which tends more towards hardware vs. software. ... so, my point is -- don't roll your own. Talk to other people who have done similar stuff, and build on their work, otherwise you're liable to make all of the same mistakes, and waste a whole lot of time. And in general (at least ESSI / ESIP-wide), we're a pretty sharing community ... we don't want anyone out there wasting their time doing the stupid little piddly stuff when they could actually be collecting data or doing science.

    (and if you haven't guessed already ... I'm an AGU/ESSI member, and I think I'm an honorary ESIP member (as I'm in the space sciences, not earth science) ... at least they put up with me on their mailing lists)

    --
    Build it, and they will come^Hplain.
  54. Ask Slashdot: Best *nix Distro For a Dynamic File by Anonymous Coward · · Score: 0

    Yes FreeNAS all the way! ZFS will do just fine. (FreeBSD rulez!)

  55. Why *nix? by CODiNE · · Score: 1

    Since you already bought the hardware, odds are you're going to run into driver issues. Since you're not already a *nix guy my suggestion is just run windows on your server. Next buy big fat USB enclosures, the kind that can hold DVD drives and put the drive sleds in there. Now you don't have to reboot adding the drives.

    --
    Cwm, fjord-bank glyphs vext quiz
  56. Um, what kind of sensor data? by Anonymous Coward · · Score: 1

    Assuming "500 gigs per 24 hours" is "500GiB per 24 hours" then we are talking roughly 6MiB per second. Are we talking video of some kind here? If not, I'll bet the data can compress down really well...

    Maybe you should think about streaming data directly to tape, and then swap tapes out. Assuming for a second that your 30 second boot time has something to do with cloning a drive or some such nonsense, using tape will allow you to bring the data to the server instead of the other way around. At this point you will get rid of the silly 30 second boot time requirement while also drastically increasing the life of your fileserver drives.

    As for the rest of your requirements, learn Linux or hire someone that knows it. Any distro can be modified to boot faster or slower since it's all pretty much the same anyway. Redhat compiles bash/perl/... from the same upstream sources that Debian does. Buying hardware that will allow you to boot faster is the real trick here. You might try to hire someone that knows what coreboot is and what it does. That would be a good indication that they are aware of factors responsible for getting a system up quickly. I would shamelessly offer my skills for hire but given the description of the problem, I'll stay out of the hiring pool.

  57. Take your pick by MichaelSmith · · Score: 1

    Pretty much any unix like OS will do fine. Personally I use NetBSD and if linux is a requirement, Debian.

  58. Sure ..... by Anonymous Coward · · Score: 0

    ... because symbolic links are supported under Windows.

    1. Re:Sure ..... by gman003 · · Score: 1

      They are, actually (at least in Vista and later, and for just directories as far back as 2K), but that's irrelevant.

      If you set Samba to follow symlinks, it will present it to any client applications as though it were the actual file. So even old DOS-based Windows systems can handle it.

    2. Re:Sure ..... by Anonymous Coward · · Score: 0

      You may also want "WIDE LINKS" set, so you can point into non-exported areas of the filesystem.

      Oh, and invalid links, links pointing to files that don't exist, won't be exported... Just FYI.

  59. Re:"Why is documentation for *nix always so bad? by Anonymous Coward · · Score: 0

    http://en.wikipedia.org/wiki/Beep_Media_Player

  60. Make this a learning experience for others by davidwr · · Score: 1

    Talk to your local University.

    Upper-division- and possibly graduate-level students in the following disciplines may be very interested in how this project turns out:

    * Computer science / information systems / related

    * The particular science related to the data being collected (e.g. meteorology if this is weather data)

    * If this is a government project, discplines related to urban planning, politics, and the like

    * If it's a business project, students taking courses related to doing tasks using available resources / tight budgets or in organizations that impose hard, non-technical, not-necessarily-wise requirements.

    No matter how you ACTUALLY proceed, letting professors and key students from these disciplines watch you and document what you do then turn their results into "what might you do differently if you were in this same situation, and why?" type coursework will be invaluable to future generations.

    And, who knows, you may get some valuable ideas you can actually implement.

    --

    By the way, I assume you have already considered and for the time being rejected the default option: Not go forward with this project on the grounds that doing nothing is a better option than attempting to do it under the constraints that you are being forced to work under.

    Please keep this "default option of canceling the project" in the back of your head, and don't be afraid to recommend it should it really become the best option.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  61. Re:"Why is documentation for *nix always so bad? by Anonymous Coward · · Score: 0

    As a former xmms user (some update finally broke it for good, and all current players midi implementations have stability/memory usage problems.), I can vouch for the crappiness of all the current players in one form or another, as well as the infuriating number of prerequisites and conflicts involved in keeping the versions YOU WANT installed.

    VLC is excellent as a videoplayer and in fact has replaced ogle for DVDs for me in the recent past (although I've had different dvds which work better in one or the other, so it can vary dramatically.) However as a music player I've found it.. less than ergonomic. And as stated above it's got memory and stability issues with the midi plug, which involves a lot of my 'older' musical interests (game soundtracks and such).

  62. Poor technical documentation? by FridayBob · · Score: 1

    The best explanation I've heard so far for why technical documentation in general (and in this case *nix documentation in particular) is often so poor is from a sci-fi TV series, called Eureka. In one episode, the characters search in vain for a manual to help shut down an antiquated launch system. When they figure there never was a manual, one character asks another why the builders did not bother to write one. The reply he receives is "Well, what do you want: progress or poetry?"

  63. FreeNAS 8 and ZFS by fak3r · · Score: 1

    s/t

  64. Lack of docs partly due to many geeks' attitude... by TheSeatOfMyPants · · Score: 1

    Why is documentation for *nix always so bad?

    Try reading Slashdot comment threads regarding the liberal arts, writing/English as a major, or even just people whose primary talent isn't in technology -- would you want to devote a significant chunk of time & energy volunteering in a community that felt that way about your field and everyone within it, even if the community is focused on an interest/hobby you have?

    --
    Now mostly at Usenet:comp.misc & SoylentNews.org (it's made of people!)
  65. More Information on what you are doing. by Anonymous Coward · · Score: 0

    I might recommend a Software SAN solution such as Nexentastor which I have used ( search on google for "40000_FOOT_OO" )

    I would need a lot more information on what you are trying to accomplish to give a yes or no for this particular application. RAID Z is one of the best file systems I have seen to date and the storage unit supports ISCSI NFS CIFS and SAMBA (I believe.) ISCSI is great there is initiators for just about every OS out there and its not a large client either. Windows 2003/2008/2008R2/Core 2008R2 all have clients built in to them so does Linux and BSD and Solaris (I assume OSX too).

    How that would work with hot swap file systems on disks might be a bit tricky since it would take (a lot more ) than 30 seconds to boot if a drive is missing...Maybe use hot swap as others have suggested?

      (I have no Hot swapable SAS drives on my home testing system which cost me 10,000 for everything over a long period of time in assembly.)

    I was thinking perhaps you could attach those drives to a secondary system and copy them over to the primary storage system over ISCSI over 10G connections for quick transfers [perhaps even multiple bonded 10G links (short ones).] That way the main system wouldn't be AT ALL and it just becomes a backup restore or snapshot type of copy over ISCSI (by the way the VMWare ESX ISCSI intiator is the best one I have seen (seems to always work) Linux ISCSI initators coming in second Windows isnt bad in recent releases of server 2008 R2.

    Utilizing portable flash based drives would cause more frequent replacements and a higher cost but would definitely allow you to come closer to saturating the bus during the transfers. It might even be possible to use NAS devices as the transport devices with multiple drive capabilities with flash drives to do raid 1's on them to double reads from the drives and would better insure the integrity of your data by replacement of the transport devices drives in a time staggered fashion.

    (It all seems a little too sneaker net.)

    Perhaps if the sensors are within ethernet or fiber range you could link them to your system that way (rather than using wireless which might be a security risk to you, but that all depends on distances you have between connections and whats in between. This would be fairly low cost for deployments that might be as big as half a dozen football fields (fiber) or 1 or 2 football fields (ethernet) you could route using layer 3 switches to extend.

    You could also use optical laser links or Microwave (line of site)(this might be cost prohibitive so forget I mentioned it).

    If you look me up I might be able to offer better suggestions.

  66. Some Ideas by Anonymous Coward · · Score: 0

    I might recommend a Software SAN solution such as Nexentastor which I have used but this site wont let me point you to my blog (stupidly).
    I would need a lot more information on what you are trying to accomplish to give a yes or no for this particular application. RAID Z is one of the best file systems I have seen to date and the storage unit supports ISCSI NFS CIFS and SAMBA (I believe.) ISCSI is great there is initiators for just about every OS out there and its not a large client either. Windows 2003/2008/2008R2/Core 2008R2 all have clients built in to them so does Linux and BSD and Solaris (I assume OSX too).

    How that would work with hot swap file systems on disks might be a bit tricky since it would take (a lot more ) than 30 seconds to boot if a drive is missing...Maybe use hot swap as others have suggested?

      (I have no Hot swapable SAS drives on my home testing system which cost me 10,000 for everything over a long period of time in assembly.)

    I was thinking perhaps you could attach those drives to a secondary system and copy them over to the primary storage system over ISCSI over 10G connections for quick transfers [perhaps even multiple bonded 10G links (short ones).] That way the main system wouldn't be AT ALL and it just becomes a backup restore or snapshot type of copy over ISCSI (by the way the VMWare ESX ISCSI intiator is the best one I have seen (seems to always work) Linux ISCSI initators coming in second Windows isnt bad in recent releases of server 2008 R2.

    Utilizing portable flash based drives would cause more frequent replacements and a higher cost but would definitely allow you to come closer to saturating the bus during the transfers. It might even be possible to use NAS devices as the transport devices with multiple drive capabilities with flash drives to do raid 1's on them to double reads from the drives and would better insure the integrity of your data by replacement of the transport devices drives in a time staggered fashion.

    (It all seems a little too sneaker net.)

    Perhaps if the sensors are within ethernet or fiber range you could link them to your system that way (rather than using wireless which might be a security risk to you, but that all depends on distances you have between connections and whats in between. This would be fairly low cost for deployments that might be as big as half a dozen football fields (fiber) or 1 or 2 football fields (ethernet) you could route using layer 3 switches to extend.

    You could also use optical laser links or Microwave (line of site)(this might be cost prohibitive so forget I mentioned it).

  67. Do you need to Boot or have a unified file system? by Anonymous Coward · · Score: 0

    Why boot? Figure out how to hot plug the drives. The sleds may be set up for it. or use ones that are.
    Why a unified drive? Create an export directory(s) on the server, mount a drive, create links for each file file on the mounted drive into the eport directory(s), Samba Should be able to make it transparent.
    12 sleds x 500GB, 6 TB a day, at 500 MB/sec (optmistic) you have 12,000 seconds (or more) to copy or access it all. Will the CPUs be able to keep up?
    Will you need 10Gbs networking to get it to the consumers?
    Fun archecture problem, this. And there is a lot you didn't tell us.

  68. An Ubuntu setup by Anonymous Coward · · Score: 0

    Ubuntu server
    lububuntu-desktop package (hear me out!)
    FreeNX + NXMachine to start a remote desktop session
    SFTP or Samba for file sharing
    Jailkit to jail the users to a single folder that contains the actual shares.
    Eiciel for setting up ACL
    Make users and read/write groups with the desktop GUI app (I'm a lazy person)
    Use FileZilla or WinSCP to connect to the box for SFTP or use Samba

  69. :p by Type44Q · · Score: 1

    There's also requirement that the server has to boot in 30 seconds or less...

    Uh, disable Pulse Audio??

  70. If Unix is documented so poorly... by Anonymous Coward · · Score: 0

    If Unix is documented so poorly... then please show us the documentation for how you're doing this on Windows.

  71. Try USB instead by Anonymous Coward · · Score: 0

    You don't need Unix or Linux. Get a copy of Window Server. Equip your Windows server box with USB 3 ports. Plug your HDDs into USB 3 caddies then plug them into the server. Share the USB 3 drives and you're done.

  72. *nix? by jones_supa · · Score: 1

    Why say *nix? We're gonna use Linux anyway, right?

  73. Re:"Why is documentation for *nix always so bad? by jones_supa · · Score: 1

    Which kind of worked fine (though it couldn't sort my mp3's in s sane order, which is why I wanted XMMS in the first place) until I got tired of music and wanted to stop. So, I clicked the X in the upper-right corner... the window dissapeared, but the music continued.

    If you close the window, Rhythmbox stays in the tray. You have to explicitly quit it from the menu.

    Great. Had to ps and kill, and now my beer supply is out of sync with my enthusiam for music.

    Heh. I know the feeling!

  74. Easy by V!NCENT · · Score: 1

    The boot process is greatly sped up with SystemD. Even though there is a chain of dependancies in the boot process, SystemD manages to boot largely in parallel, still. It is compatible with SysV bootscripts, so no sweat here ;-)
      Fedora is a nice distro, but not stable. However, if you boot directly to a prompt, even Ubuntu Servers boots in a few seconds of a mechanical drive.

    Now on to the data sharing; take a look at FUSE (Filesystem In UserSpace). This allows for unifying multiple filesystems into on virtual filesystem.

    You can PM me for further questions and for asking my e-mail, if you need more direct contact :-)

    --
    Here be signatures
    1. Re:Easy by V!NCENT · · Score: 1

      PS: Coreboot is a standard BIOS replacement that makes massive server booting and rebooting a matter of seconds, instead of minutes :)

      --
      Here be signatures
  75. Re:Lack of docs partly due to many geeks' attitude by silas_moeckel · · Score: 1

    The unit docs used to be excellent then the 90's came and the fired technical writers and told engineers to do it. Engineers as expected have piles of domain knowledge and that reflects in there documentation couple that with a general disdain for the mess that most languages are and you get something with a steep learning curve that has a tenancy to be out of date

    --
    No sir I dont like it.
  76. Windows + USB sticks by Anonymous Coward · · Score: 0

    You should go with what you know best: Windows + USB sticks!

  77. Documentation is bad because by Anonymous Coward · · Score: 1

    nobody pays for it and nobody will take the time to explain the system to people who might write it.

    To be blunt:

    This should be obvious to anyone who gives it any thought at all. You might try thinking about and solving your own problems instead of just posting them to a web site somewhere

  78. Want it toboot in 30 seconds. by Anonymous Coward · · Score: 0

    Use genkernel, genkernel --install initramfs, genkernel --menuconfig all should be module support for everything. There that should fix you up.

  79. Formats? by charnov · · Score: 1

    What format are these drives in? Are they flash drives formatted in FAT32... great plug them all into a powered USB hub and share the files... no, well... bummer.

    Are they stand alone ZFS pools? Great, drop them into your ZFS SAN and mount the zpool and share away... no, well... bummer.

    What file system are they presented in? Could be anything... if it's Plan9 9P then maybe we can say sure what the heck... anything else and you're going to have to be a bit more specific.

    For point of reference, I have two SAN systems at work. One is very fast, 4TB and runs on OpenIndiana and uses ZFS for our database and email servers. It takes several minutes to bring all disks online and be fully functioning. These are flash disks and it has 128GB of ram. It's screaming fast but has lots and lots of small disks. It cost $24K and is made for crazy speeds (saturates two 10GBe links and handles 120k IOs read/write simultaneously no problem).

    The big SAN is 40TB and boots in about one minute to a useful state and starts bringing disk online with 10 minutes. It cost $2.5 million and is about the size of a minivan. It's made for gigantic simultaneous IO and 5 nines of availability and has dozens of easily removable drives and is extremely tolerant of hot swapping.

    Be more specific OP.

    --
    [RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.
  80. Re:Wow - everyone forgot about FTP by Anonymous Coward · · Score: 0

    How can everyone ignore FTP?

    Did you all forget that all OS's involved support servers and client solutions...

    I have found FTP to be faster than SAMBA with a majority of hardware/OS/Software solutions.

    Also, a 2 Watt Marine grade USB to 802.11 b,g,n radio costs $40 in quantities of 1.... It has a range of easily over 5 miles.

    Drivers exist for any PC collecting data...

    Terrain is less of an issue when you have the power to overcome it.

    Mesh? Why? We still need to know HOW MANY DATA COLLECTION SITES????

    You don't need mesh to buffer 15 minutes of data and then send it in on schedule.

    You would gain the option of remote control of each collection site if each was part of a network.

    2 watts is a smaller power drain than a local drive swapped out every 24 hours, so if the remote units run off grid (you said the site was "undeveloped land") this would lower power consumption at the remote locations.

    I have to agree with just about every Linux/BSD user here, Leave the systems running and hot plug... if you knew *nixes you'd know by now that there's no need to shutdown to add drives. (Even more importantly boot time becomes irrelevant if you don't do it more than a few times a year...)

    Ummm, why not use a database to store the data instead of flat files in a folder? Data manipulation and filtering would be much simpler...

    I also agree that the system was designed backwards...

    Given that, what server hardware is involved so we can tell you what you can expect for boot times???

    Is it a standard server we can look up specs for, or is it home built from imaginary needs?

    How new is the hardware?

    Do you have, or can you get a few SSD's for the boot files?

    What is your budget?

    You have crippled everyone here by lack of details like these...

    I still contend that 2 watt radios with 12dBi antennas would be best to send the data directly to your server site. Either all in realtime, or scheduled with rotating buffers. You'd possibly need a few antennas on a small tower at the server site, each setup with their own subnets and channels, depending on how many remote stations there are.

    Mesh is not needed for a 10 mile x 10 mile patch of land given enough cheap high powered radios and smart setup of timed payloads to be FTP'd on schedule to the server site. Your choice of database could easily import the data and drop any redundant data on import, would allow multiple access to the same data, and would have the speed required to store all of the data.

    There's nothing wrong with mesh, I just don't think a couple miles of unimproved land requires it unless each site is just a couple MB per day and there is a huge number of sites to total 500GB a day. I also believe that number is arbitrary and not your actual exact need. It would help to know EXACTLY what is being stored.

    Is the data wind speed, temperature, etc... or are you counting wildlife with motion triggered cameras and video recorded?

    More data is needed about this supposed project.

    If wireless data acquisition is out because your budget is already spent, then as some have stated, you will likely fail or have an extremely difficult time not only making it work, but also maintaining the system. The more you walk around with HDD's the more chances one or more will get static zapped or otherwise damaged.. That is not good for research. (Each failure drives up cost of new hardware..)

    Good luck.

  81. Re:Don't ask??? WTF? by Anonymous Coward · · Score: 0

    Yes, we absolutely ask.....

    IS this a school project that can't be left running?

    The amount of data you are talking about copying from these removable drives will take a long time to copy, and hot swapping drives will help speed up the process if you really can't afford to add $40 to each remote station to send the data in over a wireless connection.

    http://www.amazon.com/Alfa-Waterproof-Outdoor-wireless-Integrated/dp/B003ILWRLI/ref=pd_sxp_grid_pt_0_1

    That's including shipping, and you could probably get a discount if you ordered one for each site...

    You already shot yourself in the foot if you have no budget left over.

    What is some hardware fails and needs to be replaced? Do you have spares of anything?

    How long will the collection project run?

    You can give us the answer of why it needs to be started and halted so often without a story.. unless you are a troll....

    It's easy... eg. We don't have 24/7 power at the server site... or, We operate the File Server from a mobile environment (close to first example)... or My BOSS can't afford the power to run 24/7... I still am getting the impression you are a troll looking to incite riot between the various BSD/*nix supporters here.

    The question does not require a long answer. certainly it can be summed up in 3 or less sentences.

    I, for one, am about to walk away from this thread on /.

    As others have posted, BOOTING in under 30 seconds is a fake, but doable task for the core OS.... (and just about any *nix/BSD distro)

    There is no valid reason that 45 seconds isn't just as acceptable... go get a cup of coffee.... your time is not that valuable or else you'd have a budget for this job.

    an extra 15-30 seconds is not unreasonable to give time for a robust server to boot.

    I don't believe the number is real. Nobody is going to die, and waiting for all the drives to show up on site from such a large site gives more time for said system to boot. You could even have it wake on a timer event so it could start just before you arrive on site... (if your requirements are not trolling)

    For anyone claiming a legitimate science project, 30 seconds to boot is arbitrary and not a genuine need.

    I'd like to read just one or two legitimate reasons why a person can't wait, say 60 or 120 seconds for a system to boot that is supposed to manage such large amounts of external drives with the data stored in such an illogical and unprofessional manner. (Given the requirements presented as part of the answer)

    It all adds up to a troll or an employee/student being punished by being assigned an impossible task.

    Reusing old servers from another project reeks of a school project... That increases the likelihood of hardware and storage failures. Asking a person who has not enough experience with the *nixes to already know which to use is like asking a banker which model of chainsaw is best for cutting oak... he/she may blunder around, and might find the right answer, but when it comes to implementing the device/s they may not even know how to start or maintain it/them....

    The OP is already in trouble because of a lack of knowledge and too little time to become competent.

    Or we have a troll...

    Still think that's what we have here, given the requirements do not match the industry. No budget, used hardware, not enough specs to make an informed decision, and dodgy answering a few simple questions about the supposed project.

    So, to reiterate: Don't Ask???? LOL... Yes, we do ask... answer why you need to shut it off, and what stops you from leaving it powered on. If you can have all of these remote sites gathering data 24/7 what's one more machine left on 24/7???.... that makes no sense other than from a troll....

    The reason you won't answer is because if you answer why it must be rebooted so often you invalidate the 30 second requirement and show yourself as a troll....

    Prove me wrong and answer why the 2 requirements: Why do you need 30 se

  82. did he just say that? by Anonymous Coward · · Score: 0

    "Why is documentation for *nix always so bad?"

    You didn't just say that did you? /facepalm

    *nix/bsd/*inux's have the best documentation out of the selection of OS's period. Every command, function, ANYTHING that has anything to do with the os has a man page that goes into pretty great detail on what it does, what options it can use, how to use it...etc. Can't say the same about windows, not even close and as for mac os x.?? I have no idea.

    The problem is more of, why can't the user think of the proper question to ask.... as in thinking of what to search for in the Ginormous collection of man pages and other help files. This doesn't even count free online sources or any books..etc that could help as well.

    also without knowing how many actual sensors putting out this much data you have to work with it seems rather silly to use this type of system to begin with. Isn't it rather cumbersome to have to swap all of these drives in and out of the system every day? why not use wireless? For specific applications that don't really need general wireless standards (802.11 /a/b/c/g/e/f/g....blah blah) you could implement a high speed point to point type relay and not have to deal with a bunch of failure prone hard drives, especially if the sensor units with the drives are open to weather exposure (even if the drives are enclosed inside) you have greatly increased risk of drives failing.

  83. GlusterFS now RedHat Storage Platform by Anonymous Coward · · Score: 0

    Try the google, it is fairly adept at queries like these...

  84. Lose the Samba requirement by Anonymous Coward · · Score: 0

    Even after fucking around with read/write buffer sizes in the .conf files and mount command lines, Samba is only about 30-50% as fast as NFS and FTP. All *nix machines can handle NFS and even Windows 7 ships with NFS support (you just have to enable it). If you want to read the entirety of 12 x 500GB drives every day then you're going to be wanting 71 megabytes per second - you won't get that if you're using Samba over a single gigabit link.

  85. a simple solution by submain · · Score: 1
    would be:

    - loop recursively through all the files in the hard drives
    - symlink them to another folder with the same structure
    - share that folder

    lather, rinse, and repeat every time you add/remove a drive. Not the most efficient or fancy solution in the world, but if you now bash you can write that in 10 lines of code

  86. dCache by colin_faber · · Score: 1

    dCache is probably what you want.

  87. Decoding the requirements by beegle · · Score: 1

    A lot of my biggest concerns have been addressed by others. A few things that I haven't seen covered:

    The "30 second boot time" limit makes me assume that there is something time-sensitive about this data collection. (Otherwise, why would you be wasting time on it?) So, you need a fast boot, but then you're mucking around with Samba and union mounts, which are both relatively slow. This doesn't make any sense. This is why people are asking questions or making up odd scenarios in their answers.

    The odd scenario that I'm assuming is that you have more drives than sleds, so you need to go through a few load-boot-read-shutdown-unload cycles to get all of your data OR the machine's being "borrowed" to read the data, so you need to bring it up with an alternate OS quickly so that you can work through the night before returning it to normal use in the morning.

    If that's the case, it really sounds like (as someone else suggested) that you need to separate the collector from the persistent storage. Set up something that can read the data from all of your "dynamic" drives as fast as possible. Depending on the data, something like rsync or even netcat might be the fastest way to get data off of the machine.

    --
    --