Why Not To Shout At Your Disk Array
Brendan Gregg of Sun's Fishworks lab has an interesting video demo up at YouTube demonstrating just how bad vibes, if expressed with sufficient volume in front of a rack full of disks, can cause a spike in disk latency. White noise, evidently, doesn't do them much harm. (Maybe they just feel awkward to get yelled at on camera.)
he's like the crocodile hunter of loud server rooms
People yelling too much at their computers
It's been known for a long time vibrations are not good for discs (see notebooks). Even by early 90s music CDs had skip protection. If a disc skips, latency will of course momentarily increase. And with tolerances down even further, it's probably worse than back then.
In 10-15 years it won't matter anyway, almost everything will have SSD by then.
I wonder if the latency would vary by the pitch and tone of the person yelling. If that's the case, I'd wonder if that could be extrapolated into reconstructing whatever was being said. Granted, if you're yelling that loud, the person in the next county is more likely to hear you first.
It bothers me,
How this guy actually made the discovery.
He must have let off quite a bit of steam towards that rack.
Might the the drives themselves be sensing the induced vibration via an embedded accelerometer and momentarily parking the heads to avoid damage? It seems like the marketing folks shouldn't have too hard of a time putting a positive spin on this behavior.
Now when Skynet finally goes sentient, it'll sue for emotional abuse. I thought metal death machines were bad, but now Lawyer-bots? We're doomed.
A few reasons.
Sure, with ZFS JBODs are the preferred storage. Let ZFS do end-to-end management of the storage, from the file level to the raw disk blocks. That way it can do it's end-to-end error checking and possible correction. If you do RAID1 in hardware ( really just firmware in the storage box) you trust that software to detect all problems and correct them or report them. That software may not do checking to see if both branches of a mirror are correct and pass on bad data upstream. ZFS will detect this because of it's checksums, but it will not be able to correct this. If ZFS is doing the mirroring it will detect it and read the other mirror, if that checksum is ok, it will correct the error and continue.
ZFS implements software RAID on top of JBOD. The box full of disks itself need not have any RAID controller, and if you're using RAID-Z, it would probably be a waste of money to spring for one, unless you go for the super-high-end for performance reasons.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
The plants publications never seems to die.
Plants don't react to music, they react to the tiny shifts in air just above their stomata. The publication which reported this compared plants with music (read: vibrating air above the stomata) with plants in an enclosure without air vibrating (read:refreshing) above the stomata.
The experiment shows a difference, even if there's air-movement simply because air "sticks" to the surface of plant's leaves in close proximity - behaving like a fluid. Normal air ventilation doesn't refresh this thin layer as optimal as vibrations caused by sound.
"Violence is the last refuge of the competent, and, generally, the first refuge of the incompetent" - Thing_1
Well, partly at least. It's no secret that disk drives are sensitive to vibration as this video showed an extreme case. Keep in mind, since disk drives are spinning at 7200-15000RPM, they themselves create vibration that can affect adjacent drives. The drive enclosure can help reduce the problem with use of shock absorbers and vibration dampeners. Most drive enclosures nowadays, for cost reasons, are no more than just sheet metal wrapped around power supplies, fans and drives, which contribute to the problem.
hmm, bit of a chicken and the egg scenario there, isn't it?
is it slow because you yell at it, or do you yell at it because it is slow?
Either way, in the end it only degenerates into a downward spiral, where the computer gets slower and slower, while you get more and more pissed off at it and yell louder...
I am not stubborn. I am right!
Secret Fact : Ultrasonic noise at low volumes is WORSE !
It took weeks to testing to get to the root issue of WD Raptors dropping in head seeks on very high end raid cards in tiny head movement seek benchmarks, but padding each JBOD drive in acoustic foam (shooting range foam), or testing one drive at a time, instead of 4 or 8, (either method works) increased I/O per second by 40% in a rack chassis.
40% more head movements per second if no ultrasonic noise entering drives !!!!!
This is VERY VERY RARE INFO, and only I, the head of Gigabyte in Asia, and two engineers in california know of this discovery.
And because I know no one on Slashdot will mod this up, and no one reads at 0 anymore, I can trust my astounding well researched secret shall remain secret.
Its sadly 100% factual.
Or did I just mis-hear him?
"JBOD" in this context will be a reference to the style of disk array (eg: vs one with a RAID controller like the Dell MD3000), not the ZFS RAID level.
And are those kept near the nuclear wessels?
Sorry, could not resist.
Also from Brendan Gregg comes the always useful /usr/bin/maybe. Other funnies from him here.
go and measure your own performance degradation while your hard disk does something mean to you
I dub this guy the disk whisperer ...
I'm not an engineer or absolutely sure about how the brain works with white noise, but I had a job that I worked at that when I entered the freezer section, it didn't seem loud at all. Actually, it so much didn't seem loud that the few times I had to enter it, I forgot my ear plugs until I saw someone else using them.
Anyway, even though you couldn't really hear anything 'loud', if you tried to talk to anyway, you could barely hear them.
On to my question. If you have enough high amplitude random noise that is effectively destructive interference, would this make an enviorment where low amplitude sound could not be hear or even mechanically sensed easily?
I know using 'heard' may be incorrect in this context because perceived sound usually has no direct relation with what's mechanically going on with the sound waves.
because all server admins are busy 24/7?
Server Admins are getting paid to 'watch' the servers. They have plenty of pseudo-free time. It's when stuff is breaking that they're busy. Not to mention a good admin in large server area will have software like that person had to watch drive latency.
Disk drives have a resonant frequency
I've seen dramatic demonstrations of this over the years. One that stands out was a test of a Bryant drive sometime around 1970. In those days a 2 GB drive was at the edge of the envelope and Bryant was test-marketing just such a beast. It consisted of eight four-foot platters mounted four to a side on a shaft going through a monster of an electric motor. The heads were mounted on arms whose positioning was controlled by hydraulic cylinders big enough to be used as shocks on a pickup truck. The whole thing would not fit in the back of that pickup truck.
We were testing the thing with a program called the "Leese Bomb". Leese can identify himself or remain anonymous--I won't turn him in. The "Bomb" part was the nature of the test.
Basic tests in those days would involve writing a whole track and then reading it back and comparing what was read to what was written. You'd do this a number of times with different patterns to capture not only faults in the surface, but any sloppiness in the head control. The Leese bomb went one better.
It would write to the outside track, write to the inside track, read the outside track, read the inside track, and then compare. If the comparison failed it would repeat the test, and keep repeating untl it succeeded, counting the failures. If the test succeeded it would index the test both inward and outward so that the tracks tested would move toward the middle, cross, and continue. This test was superior in that it would capture dynamic flaws in the system as the distance the heads moved, and the time to move varied from max to zero.
In the case of the Bryant Drive (and, accidentally, an innocent Ramac drive at Caltech), the test found a resonant frequency. When the heads overshot their mark causing an error, the test stayed on the back and forth pattern, reinforcing the resonant motion with each cycle of the test. The drive started walking across the test floor in three-inch hops, but not for very long. In a few seconds, one of the shafts broke and one of the platters, a 500 pound disk rotating at 2400 rpm broke through the front of the unit and flew across the building until it was stopped, explosively, by one of the steel columns supporting the roof of the building. Miraculously, no one was hurt.
We gave up on Bryant for that application. Not long after that, CDC introduced its 200MB drives, and they passed the Leese Bomb with flying colours. Ten of them didn't take up any more room, or cost more, than the big Bryant, so our client was happy to go with that solution.
In any case the lesson is that, if it has moving parts, resonance is an issue.
I'm a Programmer. That's one level above Software Engineer and one level below Engineer.