SSD Failure Temporarily Halts Linux 3.12 Kernel Work
jones_supa writes "The sudden death of a solid-state drive in Linus Torvalds' main workstation has led to the work on the 3.12 Linux kernel being temporarily suspended. Torvalds has not been able to recover anything from the drive. Subsystem maintainers who have outstanding pull requests may need to re-submit their requests in the coming days. If the SSD isn't recoverable he will finish out the Linux 3.12 merge window from a laptop."
No backup?
"If any question why we died, Tell them because our fathers lied."
He should listen to Steve Gibson and run Spin Rite.
News at 11.
Pity that this excuse never helped me in college.
obviously a professional like Torvalds had redundancy and backups, right?
That's all that Ballmer needs to stop Linux? Just find Torvald's SSD?
"No freeman shall ever be debarred the use of arms." -- Thomas Jefferson
Maybe Linus needs to create a backup program like he did when he wanted a better version control system and created git? Also, why is the only copy of the changes on his local workstation and not a server with redundancy? This seems rather amateurish.
today is spelling optional day.
How convenient Linus! Feeling the heat, huh?
Linux said "So I don't want to necessarily blame the harddisk, since it's just ten
days since I upgraded the rest of my machine, after it worked years in
the previous one. That just makes me go "hmm". As far as I know, all
the fans etc were working fine, but.."
There's his problem: "after it worked years in the previous [machine]."
His SSD died a natural death of old age.
IMarv
Trusting software vendors is no smarter than trus
Bet you regret knocking those platters of spinning rust now, don't you Mr. Torvalds?
There's no -1 for "I don't get it."
omg lel upboated! whuts ur reddit usrname?
Surely he's not working on a single drive system?
I don't feel anything but shame for someone losing data in a hard drive crash who has or should have network backups available to them. If this happened to anyone but Linus the majority of the comments would be calling the coder a n00b. If it was Balmer there would be an absolute riot of anti-MS venom....
I guess the great Linus has fallen into shadow.
Someone get this man some drives and an admin. STAT!
Was he too busy treating people horribly to audit his DR procedures?
Fuck Ajit Pai
....I hear rsync is lovely this time of year....
Couldn't have happened to a nicer guy
Linus is a millionaire
No RAID
No software mirroring
No RSYNC
No backup
No excuses
Linus, you are losing it.
So Linus got bitten by the same Intel SSD bug that bit me and many others?
I use a Mac at work and Linux at home (because I like the power, and I'm cheap) but I haven't found anything as convenient as Time Machine for Linux. On my MacBook I just plug in my firewire drive every morning when I arrive at work, and have incremental backups of everything at something like 30 minute intervals. Torvalds, obviously, does not have a similar setup.
Why is this news... is this our version of People magazine, where instead of hearing about all the details of the Kardashians' lives, we hear about every email or event that happens to Linus?
Did it have an 8mb partition by any chance?
I find it amazing to consider that he is not working on a redundant and well backed up machine. Where's last hour's backup? Yesterday's backup? Even pig farmer's know to backup their data.
Ghost of WfW
I'm no kernel maintainer but...
If his workstation is so important why doesn't he mirror the disks?
Back them up regularly?
Run a remote desktop to a server with the above conditions
It is not necessary to publish every single thing Linus does, or is involved, in Slashdot. I remember time when linux did not get any airtime from press, but if thing progress the way they have for latest decade in couple years time there is a news title 'Linus farted'.
I use md RAID 1 on my build machine. That would have been sufficient to preclude this silly mess.
Shutup about backups. This isn't about backups. Kernel.org and many other places back up the kernel tree.
This is about availability, of which there is currently none, due to Linus's obvious lameness.
Is to send a profanity-laced e-mail to the hard drive. Perhaps then it will see the error of its ways and begin working properly.
BACKUPS, BACKUPS, BACKUPS
What kind of an idiot uses a year-old SSD for critical work? Oh, yeah, the kind of idiot who routinely flames other people for trying to get him to make code changes he doesn't like, rather than explaining his position in a calm, rational fashion.
Karma's a bitch, no matter how famous you are.
...for over an hour when Torvalds had to make an emergency run to Albertson's for some toilet paper and hostility medication.
I've owned several hundred hard drives over the last 30 years. I've never had an active hard drive drive just blank out. I have had drives that had not been powered for a couple of years refuse to ever come back. But if I did not feel the need to even power the thing on for years, you can imagine how little I cared for what was on it.
In the last four years, I've owned around 20 SSDs. I've had five failures. Every single one was the drive just instantly lost everything. Amazingly, in four of the five cases, the drive still worked fine! It had simply lost all the data on it and believed itself to be a blank drive.
That said, the speed of SSDs makes them worth the risk to me. But I take backups far more seriously than I used to. I need them far more often.
It's so fun to watch all of the Linux Army out there is a tizzy to defend Linus. Watch the lame excuses getting modded up. If anyone else would have had the same happen these same apologists and moderators would be throwing dungballs at the crash victim instead of turning themselves into human shields for their messiah.
"Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)"
- Torvalds, Linus (1996-07-20). Post. linux.dev.kernel newsgroup.
Well without any pushing to a remote repo it's as mortal as any other source control.
In debates about Christianity, there are two groups: those looking for answers, and those looking to just ask questions.
Now there a new meaning for Kernel Panic!
I could write something witty for my sig, but instead wrote this...
(A)bort, (R)etry, (F)ucking Fail
And an oblig balmer style quote
BACKUPS BACKUPS BACKUPS BY DEVELOPERS DEVELOPERS DEVELOPERS
Having a single point of failure disrupt something as essential as Linux kernel development doesn't instill confidence in the business world. Why are those pull requests not filtered through a separate system running GitHub, and one with some redundancy?
I learned long ago after some close calls to back everything up. In my case for my desktop I store my data on a XFS partition stored on a RAID 5 hard drive array. I also am using Crashplan to back up all of my data, both to a removeable hard drive and to the cloud with over 3TB of data backed up. The nice thing about Crashplan is that it continually backs up, taking periodic snapshots so I can restore a previous version of a file if I wish. The main drawbacks of Crashplan are that it runs on Java and can be a memory pig. I pay $6/month for unlimited backup of up to 10 machines and have several computers backed up with them now. With the proper settings on my router I don't even notice all the backup traffic running in the background.
Since I have had sudden SSD failures in the past I also dump my root XFS filesystem weekly onto my RAID array (it takes under a minute to run xfsdump) and incremental backups nightly and those dumps get backed up on the cloud as well.
I have found the XFS tools to be quite good at recovery when things go really bad. When running software RAID 1 I had problems where drives would drop out of the array for apparently no reason and I have had several occasions where while rebuilding the other drive would pop out of the array. Switching to an Areca hardware raid controller with battery backed DRAM ended those problems (besides seeing a big performance improvement).
I have found the RAID controller to work well when drive failure occurs and it even recovered after human error (I accidentally disconnected one of the active drives while it was rebuilding and reconnected it).
I won't use btrfs yet. The last time I tried it about 6 months ago it was quite slow and I have a lot of concerns about the storage filling up due to COW that have not been adqeuately addressed as far as I could tell. I tried setting it up for a Cyrus IMAP server on an Intel SSD and it was unusably slow just untaring all the files so I ended up going back to XFS.
SSDs are still relatively new. I have had issues with some firmware versions and had one fail catastrophically after only 2 weeks of use. I have also had compact flash and SD devices suddenly fail. My experience is that usually mechanical hard drives give some warning (i.e. SMART) and they tend to last years. I have a server I just retired where the hard drive had 10 years on the clock according to SMART.
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
I think I speak for everyone here when I say... this
I'm a good cook. I'm a fantastic eater. - Steven Brust
my pr0n collection is mirrored across four external hdds and three laptops. shame Linus doesn't value Linux as much as I value my pr0n.
And is back at work Jesus h Christ everyone on the planet has a copy.
Does this mean all Linux Development will stop until they come up with a way to prevent or recover quickly from SSD failures?
I only look human.
My mother is a halfling and my dad is an ogre, so that makes me an Ogreling
I'm not nearly as much of a believer in RAID for the home environment. If you (accidentally) delete something on one drive it's gone from both. Better to buy two drives and do a daily rsync. That way you have a window of opportunity to recover data. Personally, I use rsync without --delete until the 2d drive starts getting full, then I use the --delete flag to clean up.
Competition Good, Monopoly Bad.
... to protect him from this kind of calamity, for example by using mirrored disks (even with SSDs) whenever possible and by ensuring that regular backups are made of all of his important data. Of course, privacy-wise it would be better if he did it himself, but apparently it's just not a priority. I've long found it curious that so many excellent programmers are comparatively inept when it comes to looking after their own machines and data.
I have a mirrored set of SSD's on all my important machines, and RAID 6 for bulk storage.
Unlike Linus, I can't afford to lose work.
"To those who are overly cautious, everything is impossible. "
http://www.wired.com/wiredenterprise/2012/10/linus-torvalds-hard-disks/
"Let us raise a standard to which the wise and honest can repair" - George Washington
So buy a new drive with the same rev boards and swap them out. Problem solved.
Only the State obtains its revenue by coercion. - Murray Rothbard
According to a speech of his, that's how Linux got started. He accidentally wiped his MINIX partition.
I run smartctl regularly to check on my disks (SSD or spinning) but I find the info difficult to interpret. Is there a service where I can upload the reult and it distills it to: fine OR dying ?
Non-Linux Penguins ?
I don't care what the mods say - this is funny as hell.
"Ignorance more frequently begets confidence than does knowledge"
- Charles Darwin
I can't believe that all of nerddom is in armageddon mode over Linus' failed drive. This kind of stuff happens all the timew people are reacting like this is cause to go to war with Syria...
This might be [electrolytic] capacitor or some other component-level magic-smoke release. There is also the dreaded, much-discussed "wear" from re-writing flash memory -- worse than you think because blocks of 64 KB [typically] have to be erased and re-written to change any byte therein.
Linus, of all people, ought to know his kernel has options to minimize the re-writes, many of them developed to optimize laptops (like delaying writes). Another thing is to mount partitions (/etc/fstab anyone?) with `noatime` as an option (maybe 'nodiratime` too). Un*x and other Linux-like systems by default will re-write the access time for any disk inode read. Turning it off reduces disk write load (and seeks on slow disks). I've had it off for over ten years an not noticed any malperformance, althrough there are rumored to be some, somewhere.
'Nuff said.
So an SSD in a desktop computer died. So what? Just run the array in degraded mode until the damaged drive is replaced.
Who uses hard drives (SSD or otherwise) in desktops in anything other than a RAID configuration? In Linux where software RAID-1 is trivial there's even less of an excuse.
In a laptop I can understand it, since there's often only space for one drive and even then you expect everything on it to be volatile.
I don't get it.
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
Linus is already back on task and the pull request I sent yesterday (after the SSD crash) has been pulled. The worst is a bit of inconvenience for maintainers who needed to resend their pull request.
Yeah, don't you know that it's pointless to have your code in a central repository that's backed up on a regular basis? It's much better to just keep weeks of work on your local hard drive since hard drives don't fail. People using central repositories for their code should be put in some kind
of mental institution.
trying to desolder 100 pins spaced 0.01" apart then resoldering them, unless you have a 0.1 mill precision soldering robot it is impossible, you can't even buy wire thin enough to do it by hand.
SMT rework by hand isint rocket science, but takes more tools than the average garage has.
Desoldering you use a custom tip for that socket/package type (one tip per package & they're not cheap). It's essentially a metal ring that heats the solder on all the pins at once. In the center of the assembly is a vacuum probe. You heat all the pins, melting all the solder & hit the button on the handpiece to suction the chip up off the board. Then clean up the pads on the board. Careful with the heat because you dont want to lift pads off the board, if you do then you have to either fix them, or make a new pads. And then if you manage to trash a via (conductivity path to a different board layer), then you've got to drill out a new one and you have to use a esd safe conductive drill with a resistance cutoff. You put a clip from the drill in contact with the layer you're trying to get to, drill down and when the drill tip makes contact with that layer the drill turns off because the circuit is complete. But it still sucks and if you don't know how all the board layers are put together you may end up trashing a trace a couple layers into the board and wrecking the whole thing.
Soldering it down you do this. Align all the chip legs on the pads. Then you can either run a small bead of solder paste across all the pins or use a wave soldering tip (small cup, uses surface tension to hold the solder in place) and drag the tip over all the pins. Heat on the pin & pad draws the solder down into the joints. If you put too much solder you might have to vac it back up and redo it if you've made bridges etc. Alignment is key, and keeping the part in position is key. I used to try and avoid using glue underneath because that made it difficult to get it back off if you needed to down the road.
Doing hand rework on that kind of stuff the hardest thing for me was dealing with smt chip caps, little bastards will crack if you heat em to fast, so you have to get a temp regulated hot plate, heat em up slow, then pick and place em quick with tweezers/needlenose & solder em down quick.
01:36AM up 426 days, 2:46, 1 user, load average: 0.14, 0.11, 0.05
You have a software feature in a server OS that supports certain client OSes to do backups to the server. RAID may be a software feature, but even if it's "software raid", you often have BIOS bootable raids that even work with one of the drives missing. This essentially means that you can work OS agnostic on a lower level than "I have a backup system that works". For Linux, you can have a backup system too that will restore from a LiveCD/USB stick and stores on a remote server. The same amount of time roughly will be needed to backup and restore, differential, incremental, full backups, the works. The solution you are providing is really nothing comparable to RAID. It's fundamentally different because it works on a totally different layer, doesn't prevent downtime and it's not OS agnostic. RAID should prevent downtime, making working backups should prevent data loss. Maybe WHS is the shizniz, you rock for making actual backups, but other than that, your post is totally offtopic in this context and doesn't even begin to solve a problem that Linus was facing with his desktop.
I'm not modding you down, even though I have mod-points, but I'm telling you exactly why I think you shouldn't have posted this. I hope you learned something from it and in the future will implement both backups and RAID when unscheduled downtime is important. Maybe you would even implement a system that works for all relevant OSes in the environment you have to do it for, without relying on a single vendor that offers a closed source product. It's a risk that means you'll have to support their product and licencing and other requirements until the data isn't relevant anymore, even after you have migrated to a competing product.
I was promised a flying car. Where is my flying car?
Modern drives for the last five years at least, have calibration factors for platter/head packs on the EEPROM on the controller board. If you swap boards, the board most likely won't be able to read the data on the disk, since it's not calibrated to the head/platter kit.
I was promised a flying car. Where is my flying car?
Probably a patched one...
Where's all the blather about disks of spinning rust now? I understand that they aren't as fast, and that they are more prone to vibration, and also shock, and use more power, but when the magic smoke erupts from semiconductors, you know the hermetically sealed packaging has been compromised, and semiconductors never first 'blow open' like a fuse, they 'blow short' like a wire, then attempt to carry hundreds of amps for a very short, time, then 'blow open' like a fuse, but not before every transistor / diode / semiconductor junction in the package has first become a short circuit, followed in rapid succession by an open one (the magic smoke is a dead giveaway). A disk of spinning rust might have bad sectors or a dead motor, or a broken head actuator, but usually at least some of it is recoverable. Not so with an SSD.
Linus needs a personal sysadmin to protect him from this kind of calamity ...
All newbs do. And newbs don't usually learn to appreciate backups until they suffer a catastrophic loss. This just proves, yet again, that brilliance in one domain does not imply brilliance in any other domain. Of course the clueless don't recognize this and think a very successful person must have all the answers, when in fact they are equivalent to newbs in many domains.
This is retarded. In this day and age to have somebody the "master of the universe" for doing merges on one machine is not only retarded but it's bad practice. Can you imagine a large company shipping a release and suddenly having to stop because a system or a drive failed? People would be answering up the food chain in a hurry. Shit, forget all of the best practices in terms of backup and redundancy to protect your business in the "real" world that would apply here but we have to account for Linus' imaginary world where he rails on and on over crap that really doesn't matter and pontificates about how "dumb" other people are.
What we have here is a Nelson moment... To all that have been belittled by his rude, arrogant nature in the past you have the right to say "Ha! Ha!"
Harrison's Postulate - "For every action there is an equal and opposite criticism"
Maybe Linus needs to create a backup program like he did when he wanted a better version control system and created git? Also, why is the only copy of the changes on his local workstation and not a server with redundancy? This seems rather amateurish.
Brilliance in one domain does not imply brilliance in any other domain. Only the clueless think very successful people have all the answers. Only the delusion-ally arrogant think they have all the answer. Only the foolish think it can't happen to them.
Yep, Linus was acting as a true newb if anything more than a small number of hours worth of work was lost. Especially so given the known characteristics of SSD drives. Not very far removed from the person who types in a word processor for hours without saving.
The only really reliable Solid State device is Stone. You can use clay, even baked clay, but it'll chafe and break. you can use trees, blocks, bark, paper, but it'll burn. You can use metal disks, you can use oxideon mylar tape, but bearings will go out and carelessly tossed refrigerator magnets will erase it. You use silicon, you might as well write on sand...
Proof that he needs to begin work on Linux '95.
He could, however, make a chinese only release of Linux 3.12.
Perhaps he should code TRIM support into the filesystem modules before continuing ;)
Err... nearly no one does contact rework anymore professionally, and even half-serious hobbyists nowadays at least have a AOYUE and a high temp vac pickup.
Then just clean the site, apply mini stencil, squeegee paste, remove stencil, place new part, reflow.
Why doesn't that stupid fucking mother fucker just fucking use a fucking real fucking filesystem that has fucking redundancy? How fucking stupid do you have to be to NOT have some level of redundancy in your PRIMARY workstation. Stupid FUCK. (**)
(**) Yes, this is a satirical sendup based on his history of lambasting individuals.
I hope that all of these resubmitted patches are exactly the same as they were before. I would hate to see that this was used as a vector to add a backdoor into the kernal.
Blah Blah Blah.
RdRand really should be deleted :)
Who's going to be the one to rant at *him* on the mailing list for this?
Err... nearly no one does contact rework anymore professionally, and even half-serious hobbyists nowadays at least have a AOYUE and a high temp vac pickup.
Then just clean the site, apply mini stencil, squeegee paste, remove stencil, place new part, reflow.
Yeah, personally I haven't done SMT rework in about 15 years, Aoyue sure has brought the price down on rework stations, that's less than I paid for my Metcal, and that's just a basic iron. I don't want to remember how much we paid for some of the larger Metcal & Pace hand rework systems back in the day.
01:36AM up 426 days, 2:46, 1 user, load average: 0.14, 0.11, 0.05
I don't maintain the Linux kernel, but my data on my HOME PC is important enough that my home PC has a hardware raid ctrlr (3ware) and I use Idera's CDP free backup advanced software (Linux and windows). I cycle my backup drives (3 of them) w.one going offsite once a month.
At work I have a real disaster recovery plan because my work matters.
Dr. Tanenbaum would have had RAID and a DR plan. Linus is bad.
http://zfsonlinux.org/
You can create a pool with 1 disk, or mirrors, or RAIDz1, RAIDz2, or RAIDz3. Optionally you can add a SLOG (ZIL on a SSD partition/disk) (make sure to create a mirror of two SLOG partitions/disks) and/or extra performance by adding L2ARC cache on an SSD. Creating backups of a pool/dataset is easy by using zfs send, ssh, and zfs receive.
This is always the story with SSDs. They suddenly die, even though all the literature from the manufacturers says this is not a mode of failure that ever happens. Supposedly, the drives enter a read-only state where they can only be read, because they are out of space to wear-level. Has anyone ever had a drive actually do this? I want Linus to send this drive back, and get an answer from the SSD manufacturer as to what happened.
Doing hand rework on that kind of stuff the hardest thing for me was dealing with smt chip caps, little bastards will crack if you heat em to fast, so you have to get a temp regulated hot plate, heat em up slow, then pick and place em quick with tweezers/needlenose & solder em down quick.
Maybe your iron doesn't have good enough thermal control, and/or is at too high a temperature? I used to swear by Weller irons, until I was coerced into trying a Metcal. I was amazed at what a difference a really good iron makes. I routinely hand-solder down to 0402 size components without any problems at all. I often have to rework 0201 sized ones, and those are harder. But with a good iron, appropriate use of flux, good minimum-size tips that you don't use for anything larger than 0201, treat carefully, and definitely don't let anybody else borrow, it's not that bad. When I need to tune up an RF path at work, I'll end up changing the same few 0402 or 0201 passives a dozen times or more, without ever cracking a component or lifting a pad.
Oh, and one other thing: If the board was built with lead-free solder, wick that crap off and do your rework with proper 63/37 tin/lead solder! It'll make better joints, and it melts at a lower temperature.
Oh Dear!
In our shop over a few years I have seen, on other machines, SSD and RAID (hardware and software) Failures galore.
I have NOT moved to SSD on any of my machines, nor have I RAID any of them! I run machines with up to 8 3TB Seagate HDD drives.
What have I seen in 3 years on my machine racks: No HDD Failures! Still running original hardware and initial build.
'Nough said. :D
I know I'm less than careful about backups myself, and only push an incremental backup to cloud storage about once a week, but then again my laptop is not a single point of failure for the development of the largest operating system in the world.
(As for desktop machines, it's pretty trivial to set up a RAID-1 and not have to rely on periodic backups at all.)
Why are you using an iron? Use hot air and an oven, it makes SMT a completely different game. Just make sure you use lots of flux.
Help I am stuck in a signature factory!
Old habits die hard, I guess. I rework existing boards a lot more often than I do assembly. For changing out discrete resistors/caps/inductors, a pair of good irons works very well. An iron is also preferable for tacking wires onto test points.
FreeBSD is looking pretty interesting these days.
Since the thread is about backup, I use rsnapshot to backup to a remote encrypted location every day. And every 1 hour a BTRFS snapshot is taken and automaticaly rotated...
I've never encountered SMT components so sensitive. Just get a cheap hot air soldering station and remove the chip. If you don't want to do the investment you can take a small piece of connection wire and push it in between the component and its legs and fixate the end anywhere on the PCB. Now you only have to heat the component legs while you pull the wire outwards and it will lift the legs without pulling at the pads.
You can't count on the component being reusable if you use that method but if it is damaged anyway you might just as well go nuts.
The only time I've had a SMT ceramic capacitor fall apart under the soldering iron was when it was already cracked before I started soldering. I think it cracked because of some transient or whatever.
"Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)" - Linus Torvalds[1]
Pfff... That's soooo last century!
Let me fix that for you, Mr. Torvalds
"Only wimps use tape backup: real men just upload their important stuff on git, and let the rest of the world clone it"
Now that sounds more typical for the current decade.
Oh, and for the MasterCard-Ads like finish:
"For everyone else, there's the NSA."
----
The funniest part is that he is the actual author of the git scm system which served him as backup this time.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Until the day when an obscure bug* triggers a cascade of events flagging you as a "Pedo-Terrorist Pirate Pronographer".
*: Or a fly lands in a typing machine
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
If only he'd been using a centralised version control system... ;-)
There's this great feature in Linux called `mdadm`. My mum's SSD failed just last month. Now she has an SSD and a traditional disc in RAID1 with the `--write-mostly` flag:
md0 : active raid1 sda1[0] sdb1[2](W)
She has SSD read performance with magnetic reliability. The `--write-behind` flag would allow SSD write performance too at the risk of the file system thinking that blocks are on the media before they're really on the magnetic media. GRUB is on both MBRs. I've had md devices remove AoE members upon error and continue magically too, just as it's supposed to.
Not just Linus himself, but also his sole hard drive!
This is what concerns me about Linux: a single egg-basket so threadbare as to be nearly see-through.
Just so this is totally clear - SSDs die too!
Now we all know we can see the point of RAID-1 (inclusive)OR frequent backups in any critical system.
No, RAID is *not* a backup, RAID's only purpose is to improve reliability/uptime by letting you ride through hardware failures, but it does nothing to protect you from all of the rest of the things that can destroy your data, like file corruption, fat fingering a "rm -rf / home/someuser", a virus, a website hack attack, etc. That's what your backups are for, but you can call them archives if you like, but don't call RAID a "backup" because it's not. Depending on what the problem is and when you discover it, you may need to go back through several archives before you find the data you're looking for.
And yet... If your drive fails just before your scheduled "Backup" starts then if it was part of a redundant RAID then guess what? Your RAID just saved that data yes? It acted as a back-up for at least an entire day's work where-as your official "Backup" did nothing for you in regards to that data.
So yes, RAID "by itself" is not a reliable back-up system in every case. But then, neither is back-up software a 100% reliable back-up system in every case. Clearly both together are actually required in order to have a truly effective back-up system, not just back-up software by itself.
Considering the importance of what he does, why didn't he use a multi-drive RAID or ZFS system?
WTF? this is news?
Rack Mounted Server. I just gotta know, why is all this mission-critical operational stuff taking place on a workstation with workstation grade hardware and no backups or raids? Everyone's talking about oh raid at home isn't good, just use backup drives. Look: This is LINUX. If there's need for additional hardware and compile farms, people will probably donate. To have a single SSD failure cause so much calamity for any project, least of all *THE* open source project, is just embarrassing. Worse than swearing at your devs on a mailing list read by the whole world.
Sadly, a Libertarian cannot force his views on another, and freedom cannot spread as does the cancer known as religion.
It's the best way. If you want extra reliability, backup to more than one.
I have multiple backups in seperate locations and online redundancy (saving 50 past versions of the file, in case of damage, deletes or errors). One offsite backup is a passive one (in case of a systemwide hack) that cannot be accessed from the outside through the net. And this is just for my personal stuff...I'd expect an important person like Linus Thorvald to have a minimum of a basic backup or at least a NAS or online storage.
Was he doing building on a tmpfs to avoid wearing out the SSD? If so, I'll let him off.
I don't want to know the tirade that drive is receiving right now.
FYI, in a study Google did of thousands of drives, they found that while certain models of drive were good and some bad, all manufactures had similar failure rates. Western Digital makes some good models and some bad ones, as do all of the other manufacturers .
That said, I run 40 drives at a time. In my environment, at least, HGST (formerly Hitachi) has had the best track record for me.
That is a ridiculous statement. Work is lost every time a drive fails unless it happens to fail immediately after a backup. Full backups take lots of time. If you understood git better [SNIP]
Full backups? LOL, son.
It's not a ridiculous statement. Our backup system backs up 350+ machines every 15 minutes by default, as long as they have a working network connection, anywhere in the world that can reach our server; the client works by watching what files are changed, and periodically (every 2-3 days) doing a full scan in case it missed something. We dialed it back to once an hour based on user feedback - people felt an hour was more than acceptable in terms of lost productivity. We retain those revisions for about a week, and they're progressively paired down. Restores take seconds and are self-service, as is adding another machine to your account.
Furthermore, we use IMAP for email, so even if your workstation or laptop dies in a big puff of smoke, your email isn't lost.1995 called, wants to know why the fuck Linus is apparently using POP3.
If I had a dollar for every prick developer who thinks they know how to do IT, I'd be rich (and a lot saner.) Programmers are the worst to support by far because they have absolutely zero humility. Everyone else generally either asks how something should be done, or at least has the graciousness to ask if what they have in mind will work. Programmers just charge ahead and assume they know what they're doing because they've got a Mythbuntu box and a Linux NAS box at home...
Please help metamoderate.
find it ironic that the biggest and loudest hater of ignorance himself was victim. if anyone else in that circle had that happen it would be message after message of "what a stupid cocksucker you are". what a moron/asshole
short hint: search for 'chip quick' (mouser and similar places sell it).
its like solder that stays liquid for a long time and lets you remove chips without special equip.
(you're welcome).
--
"It is now safe to switch off your computer."
Nowadays I actually prefer SMD over thru-hole... with one exception, BGA.
Hand reworking of them is seldom rewarding, especially the now more common FBGA.
And, It's likely that most of the chips on a device like an SSD are FBGA.
The pitch really isn't within the reach of human precision, so you *need* a pick & place machine.
You also need an IR heater, and X-ray to check the connections.
We can all hear the frustration in your message for having lost data, and for that I am sorry you are experiencing that. You made some interesting statements that I have not seen myself. All the current SandForce SSDs out there have been clear of firmware bugs like this for more than a year from what I see. Mine have all been fine. Are you running with some old FW maybe? The problems I see are more often physical issues that are unlreated to the controller. It is important to select an SSD manufacturer with high quality manufacturing, and don't just go for the cheapest solution. Intel uses SandForce almost exclusively now, so I find it hard to believe what you are describing is 100% true.