Man Deletes His Entire Company With One Line of Bad Code (independent.co.uk)
Reader JustAnotherOldGuy writes: Marco Marsala appears to have deleted his entire company with one mistaken piece of code. By accidentally telling his computer to delete everything in his servers, the hosting provider has seemingly removed all trace of his company and the websites that he looks after for his customers. Marsala wrote on a Centos help forum, "I run a small hosting provider with more or less 1535 customers and I use Ansible to automate some operations to be run on all servers. Last night I accidentally ran, on all servers, a Bash script with a rm -rf {foo}/{bar} with those variables undefined due to a bug in the code above this line. All servers got deleted and the offsite backups too because the remote storage was mounted just before by the same script (that is a backup maintenance script)." The terse "rm -rf" is so famously destructive that it has become a joke within some computing circles, but not to this guy. Can this example finally serve as a textbook example of why you need to make offsite backups that are physically removed from the systems you're archiving?"Rm -rf" would mark the block as empty, and if the programmer hasn't written anything new, he should be able to recover nearly all of the data. Something about the story feels weird.
Offsite, offline BACKUPS
Does he use --no-preserve-root by default? I think that it is there for many years. Of course, if his servers are running on something from 2004, then his rm might be without this safeguard...
I saw the post on ServerFault, and while the original scenario could have happened, the OP's follow-up blunder to reverse the input and output parameters of dd when trying to preserve the disk seemed just a wee bit too unlikely. I looked at the article to see if there was any additional data to suggest this was real, but it seems entirely based on the SF thread. Until corroborated, I'm going to call bs.
I have that cold feeling in my stomach just reading this summary. ick.
I did something similar (though not quite so destructive) nearly 20 years ago when I was first learning Linux.
I my case I was trying to get rid of all the hidden files in root's (/root) home dir using 'rm -rf .*'
Guess what that did?
Yeah, that wasn't a highlight of my career...
My eyes reflect the stars and a smile lights up my face.
While this guy was most likely using traditional HDDs where block level recovery is a possibility, for those of you using SSDs that have TRIM properly enabled, don't expect to be able to recover deleted files from the same drive unless you are really really fast.
TRIM automatically zeros the blocks of deleted files and they are GONE aside from vague sci-fi and probably nonexistent NSA-type forensics.
AntiFA: An abbreviation for Anti First Amendment.
Manishs, you seem to actually critically read articles before posting them, and you actually provide insight after the summary. What is up with that?
I collect these stories for people who I mentor. Even if they're trolls, they work as cautionary tales, because lots of people have had similar smaller scale disasters (as evidenced by posts in this thread) and it's healthy for mentees to get a taste of what can happen when you (for example) forget to error check your script parameters.
In a big way it doesn't matter if it's true or not, it could be true which makes it a teachable moment. I'm sure everyone who reads the story will run a mental checklist to see if they have a script somewhere that could EVER do it. Do they have their backups mounted when they should be rsyncing, etc.
Min
On the whole, I find that I prefer Slashdot posts to twitter ones because I don't get limited to 140 chars before
Rsnap is a very popular backup system which uses network mounted drive as it's default/most common configuration. I constantly remind people on the rsnap mailing list about the existence of cryptolocker type malware.
A much safer way to do it is to have the backup system PULL backups using a read-only account. That way no command on the live system can touch the backups, and the backup system can't change anything on the live system - either accidentally or maliciously.
One solid backup / hot spare system that does it safely by default is Clonebox.
"To err is human. To really fuck things up, you need a computer."
I prefer that any bulk or query-based "delete" command ask for confirmation along with basic feedback. Example pseudo-code:
> delete *:*.*
You are about to delete 832 folders and 28,435 files.
Your choices are:
1 - Proceed with deletion
2 - List path details about the above folders and files
3 - Cancel deletion
Your Choice: __
(end of example)
It may be slower and/or more resource intensive, but that's better than mass boo-boo's.
An optional command parameter could switch off verification, but verification should be the default. This is something Unix/Linux gets backward in my opinion: the default should be confirmation mode, not the other way around. In other words, a command switch should be required to switch off confirmation rather than requiring a command switch to turn confirmation on.
Typical SQL doesn't have a confirmation mode, so I usually do a verification query on the WHERE clause before running the actual:
-- check
SELECT count(*) FROM myTable
WHERE x > 7 AND foo='BAR'
-- actual, keeping same where-clause
DELETE FROM myTable
WHERE x > 7 AND foo='BAR'
I also often inspect at least some of the actual rows, not just the count. Thus, as a rule of thumb, do random spot-checks of actual data, and a total count before final command execution.
Table-ized A.I.
I make it a point to lump people into the category of "everyone". Then I can despise them all equally without picking and choosing favorites.