Slashdot Mirror


Man Deletes His Entire Company With One Line of Bad Code (independent.co.uk)

Reader JustAnotherOldGuy writes: Marco Marsala appears to have deleted his entire company with one mistaken piece of code. By accidentally telling his computer to delete everything in his servers, the hosting provider has seemingly removed all trace of his company and the websites that he looks after for his customers. Marsala wrote on a Centos help forum, "I run a small hosting provider with more or less 1535 customers and I use Ansible to automate some operations to be run on all servers. Last night I accidentally ran, on all servers, a Bash script with a rm -rf {foo}/{bar} with those variables undefined due to a bug in the code above this line. All servers got deleted and the offsite backups too because the remote storage was mounted just before by the same script (that is a backup maintenance script)." The terse "rm -rf" is so famously destructive that it has become a joke within some computing circles, but not to this guy. Can this example finally serve as a textbook example of why you need to make offsite backups that are physically removed from the systems you're archiving?"Rm -rf" would mark the block as empty, and if the programmer hasn't written anything new, he should be able to recover nearly all of the data. Something about the story feels weird.

46 of 460 comments (clear)

  1. Three words by MPAB · · Score: 4, Insightful

    Offsite, offline BACKUPS

    1. Re:Three words by Aighearach · · Score: 4, Insightful

      That's all great, but even a less complete, sloppy backup system would be an improvement here.

      Another thing people don't understand about cloud hosting... you should still have your own self-managed, non-cloud server that holds your images and ideally runs your service during the low-traffic hours. Whatever your daily lowest traffic 6 hours is, in most cases, should be traditionally hosted. Cloud is super-duper-awesome-webscale for the peak traffic, no way around that if you have peak traffic hours.

      Personally, I can re-deploy (including the latest database backup) from my dev workstation using a simple rake task.

      Another problem is; relying on your hosting company for backups. Never do that. The same fire/earthquate/bash script/volcano that makes the backup necessary, would destroy it! Expect the hosting company to have insurance, don't expect them to care if your data gets lost. Especially if it "user error."

      This has nothing to do with "PC/internet mentality" and everything to do with the latest anti-waterfall, anti-planning, 80% is all that matters mindset. Traditionally, this was easily solved because there was an engineering mindset.

    2. Re:Three words by lgw · · Score: 2

      I have to disagree here a bit. Not with the idea of doing backups -- everyone should -- but that's looking at the half problem the wrong way. It's the right solution for customer data, but not for all the code and other materials that make your web site happen.

      I've seen this problem a lot: all the work product that makes a web presence happen gets done on the hosted server. That's beyond stupid - that's failing to even understand your job.

      All the work that goes into your hosted web site -- your store, your code that aggregates or helps the customer in whatever way makes you valuable, all that stuff -- needs to live in a version control system you control locally. Ideally github, so backups are free, but not everyone can do that. Your entire web presence other than customer data should be pushed from where the real work is done, and of course there should bee a way to revert as well.

      When you look at it that way, it's obvious that a key place to replicate your customer data to is close to the machines you do your build/push work from (not the same machines, unless you have strong read-only protection, but close). That way, if your hosting provider takes your site down on a whim, a couple of scripts you already have give you the same web site with the same data at a new hosting provider. That also makes you safe against physical server failure, rm -rf, and anything else that happens in the cloud.

      This isn't rocket science, it the minimum standard that separates amateur from professional.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    3. Re: Three words by GameboyRMH · · Score: 5, Informative

      Addendum - just checked a CentOS server, and rm --help says that --preserve-root is enabled by default, and has to be overridden with --no-preserve-root.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    4. Re:Three words by flopsquad · · Score: 5, Funny

      Offsite, offline BACKUPS

      Would not have helped in this situation. His typo resulted in this command:

      "rm -rf --no-preserve-root --write-zeroes --shred-mbr --exec-all-ssh-hosts --douse-hydrofluoric --high-velocity-eject-removable-media --carpet-bomb-offsite-backup --salt-earth"

      Which, I mean, who hasn't accidentally done that? The keys are like right next to each other.

      --
      Nothing posted to /. has ever been legal advice, including this.
    5. Re:Three words by ShanghaiBill · · Score: 2

      Traditionally, this was easily solved because there was an engineering mindset.

      You seem to be implying that data loss was less common in the "Good ole' days", when all sys admins were highly trained engineers. That is almost certainly untrue, and based on false nostalgia. Backups are much easier today, with reliable high-capacity storage, journaling file systems, ubiquitous connectivity, and plenty of off-the-shelf software solutions.

    6. Re:Three words by Megane · · Score: 4, Informative

      Because he is a retard.

      All servers got deleted and the offsite backups too because the remote storage was mounted just before by the same script

      Clearly a case of a fool thinking that a sync (copying data to another place regularly) is a backup. It's not a backup if you can easily copy corrupted data to your only copy. Or, in this case, if you can easily delete the data from your "backup" copy.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    7. Re:Three words by Aighearach · · Score: 2

      Some projects I worked on in the 90s still have tape archives of that data.

      You can easily have a situation where the backup tools have improved, and there is less overall data loss now, but that the mindset now is sloppy and leads to a lot of errors of types that were less common in the past.

      In the past when you did it sloppy, you'd get called out on it; and sometimes it still sucked, because PHB. But when that was the case, it was at least known and accepted that it was technically inferior to not have correct engineering. These days, the average shop believes that 80% is enough, and that 95% completion is too much and a waste of money. In the old days, there was technical consensus that 100% of the desired functionality... was desired.

    8. Re:Three words by Aighearach · · Score: 2

      In my experience, most of the customers of small hosting companies are paying for fully managed servers, which includes the backups. Most of the customers won't have any backup other than the code they started with. And they wouldn't know how to make a backup any more than they would know how to shoot a fireball spell out of a chopstick.

      This is compounded by human nature applying "trust" based on the quality of the personal relationship you have. If you have a nice conversation, by the end they really really want to give you their root password, have you move all their stuff over for them, and just tell them when it is finished. And then their DNS hasn't propagated yet, and they get really upset and become unsure if they should "trust" you, and get indigestion, and start calling every hour.

      The "mounted backup" part is just a bridge too far. Later in the comments he says he swapped of/if on a dd command, so now how does he prep the disk for recovery, which seems to verify the troll.

    9. Re:Three words by billyoc903 · · Score: 5, Funny

      I have this aliased to 'sl'. Keeps me on my toes.

    10. Re:Three words by geekmux · · Score: 2

      Offsite, offline BACKUPS

      Would not have helped in this situation. His typo resulted in this command: "rm -rf --no-preserve-root --write-zeroes --shred-mbr --exec-all-ssh-hosts --douse-hydrofluoric --high-velocity-eject-removable-media --carpet-bomb-offsite-backup --salt-earth" Which, I mean, who hasn't accidentally done that? The keys are like right next to each other.

      Man, I haven't laughed out loud like that in a long time. Thank you for that.

    11. Re:Three words by Triklyn · · Score: 4, Interesting

      ... are you suggesting that there's someone out there that knows how to shoot a fireball out of a chopstick?

      please elaborate on that

    12. Re:Three words by Archangel+Michael · · Score: 2

      Minimums:

      3 Copies
      2 Locations
      2 Formats
      2 Mediums

      Copies, two local, one remote
      Locations, geographically distinct
      Formats Natural, Raw, compress etc
      Mediums, SATA, USB, Tape, SAN manufacturer etc.

      By Minimum I mean bare minimum. the reality is, there should be cascading copies being made, and Long Term Arching able to restore to a set point in time. For Copies you'll need at least three, more likely more version (date specific). You should separate your copies geographically so that when California gets the big one, or Hurricane Global Warming washes eastern seaboard clean or Tsunami wipes out the Pacific Rim, you can resume business relatively quickly somewhere else. Different formats so that you can get the data you need in a way that makes it easy. You'll want the important parts of the SQL database in a non-database (XLS) format. And you'll want to isolate yourself from medium failures, a date bug in SAN, or Trying to find a floppy drive (old school) to put the floppy in.

      The problem with this guy, was that he was too cocky and didn't have proper backups. IMHO if he did "Live > Local Backup > Offsite Backup he would have been fine. You backup your live data locally, and then make a copy of that backup to remote / offsite. Three copies, two locations, two media, two formats. Done

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
    13. Re:Three words by budgenator · · Score: 2

      You'll only get the chopstick wand fireball spell when you achieve level 5 Sys-Admin.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    14. Re:Three words by well_in_theory · · Score: 2

      Suicide Linux; where any typo (as in resulting in command not found) instigates a full 'sudo rm -rf /'. Available as a debian package. https://qntm.org/suicide

    15. Re:Three words by Fetko · · Score: 2

      SysAdmin hardcore mode.

  2. --no-preserve-root by zopper · · Score: 5, Informative

    Does he use --no-preserve-root by default? I think that it is there for many years. Of course, if his servers are running on something from 2004, then his rm might be without this safeguard...

  3. Wasn't he trolling? by anlag · · Score: 5, Insightful

    I saw the post on ServerFault, and while the original scenario could have happened, the OP's follow-up blunder to reverse the input and output parameters of dd when trying to preserve the disk seemed just a wee bit too unlikely. I looked at the article to see if there was any additional data to suggest this was real, but it seems entirely based on the SF thread. Until corroborated, I'm going to call bs.

    1. Re:Wasn't he trolling? by crunchygranola · · Score: 4, Interesting

      My operating theory is that the guy is constructing an alibi. Perhaps he has gotten wind of an investigation and wants to look like a hapless idiot and not someone engaged in destroying evidence.

      --
      Second class citizen of the New Gilded Age
  4. Meh by Anrego · · Score: 2

    This is borderline bait at this point.

    Can this example finally serve as a textbook example of why you need to make offsite backups that are physically removed from the systems you're archiving?

    There are plenty of examples already and keeping a set of backups physically disconnected from running infrastructure is pretty well established practice, with random software bugs and screw ups being just one of many reasons. That said people will continue to have all their backups fully accessible (and destroyable) or just not back things up at all and things like this will continue to happen.

    Guy can possibly recover the data, but the company is probably still screwed reputation wise.

  5. Empathy by The-Ixian · · Score: 4, Funny

    I have that cold feeling in my stomach just reading this summary. ick.

    I did something similar (though not quite so destructive) nearly 20 years ago when I was first learning Linux.

    I my case I was trying to get rid of all the hidden files in root's (/root) home dir using 'rm -rf .*'

    Guess what that did?

    Yeah, that wasn't a highlight of my career...

    --
    My eyes reflect the stars and a smile lights up my face.
  6. Fun thing about TRIM by CajunArson · · Score: 5, Informative

    While this guy was most likely using traditional HDDs where block level recovery is a possibility, for those of you using SSDs that have TRIM properly enabled, don't expect to be able to recover deleted files from the same drive unless you are really really fast.

    TRIM automatically zeros the blocks of deleted files and they are GONE aside from vague sci-fi and probably nonexistent NSA-type forensics.

    --
    AntiFA: An abbreviation for Anti First Amendment.
    1. Re:Fun thing about TRIM by Rockoon · · Score: 4, Informative

      When the OS sends a trim command, with it is information about what the logical sector should look like if an attempt is made to read it again. IIRC the options are zeros, ones, and random.

      Without trim the ssd has to preserve the entire logical block device its emulating, ie if you have a 64GB drive then even if it only has 4KB of "files" on it, the device still has to preserve all 64GB because it doesnt even know what a file is, let alone that you deleted one.

      With trim the ssd only has to preserve what the OS told it was important to preserve. So instead of preserving 64GB if data it only has to preserve your 4KB of data. Trim marks logical sectors as dont-preserve.

      What the SSD will not do is overwrite trimmed physical sectors just because they were trimmed. In fact, that data could linger there for years even with a high amount of read/write activity because SSD's only erases entire physical blocks, not just the subsectors within blocks that were trimmed.

      So recovering is not sci-fi. Recovery is a fact. What can't be done is recovering the data via commands that target the logical rather than physical device.

      --
      "His name was James Damore."
  7. insurance fraud at best by Anonymous Coward · · Score: 2, Interesting

    This has such a smell of BS around it. given the fact that backups are indeed offsite and that a company has more the 1 server etc.etc. Even my own simple setup consisting of a pc, laptop, tablet, qnap and some external HDD and sticks is impossible to delete with 1 script. total bollocks.

    Wonder if he found incriminating material or has gambling debts, far more plausible

  8. manishs by Verdatum · · Score: 4, Insightful

    Manishs, you seem to actually critically read articles before posting them, and you actually provide insight after the summary. What is up with that?

    1. Re:manishs by msmash · · Score: 3, Funny

      I hope you're not being sarcastic.

    2. Re:manishs by Verdatum · · Score: 4, Informative

      I mean that I really do appreciate it. Keep up the good work!

    3. Re:manishs by msmash · · Score: 2

      Thanks :)

  9. What happened to NEWS for Nerds? by Jack9 · · Score: 2, Insightful

    This was a blatant troll on a forum and now because some idiot millennial wrote an op-ed piece, some idiot (manishs) put it on the /. frontpage?
    Are the admins now supporting the things the moderation system fights on their own site?

    This story is more of an embarrassment than the political vomit I've had to endure because _this_ story doesn't even qualify as news. e.g. What Company did he destroy exactly? You would think the incredibly obvious lack of facts would be a tipoff to someone.

    --

    Often wrong but never in doubt.
    I am Jack9.
    Everyone knows me.
    1. Re: What happened to NEWS for Nerds? by Darinbob · · Score: 5, Insightful

      I make it a point to lump people into the category of "everyone". Then I can despise them all equally without picking and choosing favorites.

    2. Re: What happened to NEWS for Nerds? by david_thornley · · Score: 2

      It's not millennials who have excessive smugness, outrageous senses of entitlement, unjustifiable arrogance, and penchants for causing lots of problems for others. It's young people. Since millennials are currently young, they get all the blame. In twenty or thirty years, the millennials are going to be saying this about the currently young generation.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
  10. Re:Why is everything "trolling" to people like you by s.petry · · Score: 2

    You missed one.

    -- Some anonymous coward complains about people calling out trolling, trolling.

    The funny part is the person you responded to did not claim "trolling", they expressed a healthy skepticism. That last part is something more people should have. There are plenty of liars out there. Quite often they work for main stream media outlets and hold public offices.

    --

    -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

  11. Still value as a troll by Minupla · · Score: 4, Insightful

    I collect these stories for people who I mentor. Even if they're trolls, they work as cautionary tales, because lots of people have had similar smaller scale disasters (as evidenced by posts in this thread) and it's healthy for mentees to get a taste of what can happen when you (for example) forget to error check your script parameters.

    In a big way it doesn't matter if it's true or not, it could be true which makes it a teachable moment. I'm sure everyone who reads the story will run a mental checklist to see if they have a script somewhere that could EVER do it. Do they have their backups mounted when they should be rsyncing, etc.

    Min

    --
    On the whole, I find that I prefer Slashdot posts to twitter ones because I don't get limited to 140 chars before
  12. Corrections by ledow · · Score: 3, Insightful

    Man ALLOWS his entire company to be wiped out in one command.

    Man DESIGNS his entire company to be wiped out in one command.

    Man SETS UP his entire company to be wiped out in one command.

    Hint: I work in schools. Once I had a teacher delete their entire planning folder. Then (and DO NOT ask me why, because I don't understand it either), they emptied that folder from Recycle Bin. They rang up in the more embarrassed panic.

    And then it was explained that we still had copies of that folder in:

    a) Shadow Copies of the profile on the client.
    b) Network Copies of the profile that they were logged in as (and which fortunately hadn't logged off once they realised what they did).
    c) Shadow Copies of the profile folder on the server.
    d) Copies of the profile folder on all the other servers.
    e) Copies of all the servers on replica servers.
    f) Copies of the server VM's and storage in a primary backup location.
    g) Copies of the server VM's and storage in a secondary backup location.
    h) Copies of the server VM's and storage in a tertiary backup location.
    i) Several off-line and off-site copies of the server VM's and storage .
    k) Random, casual backups all over the place.

    And that's just for the crap that teachers think is important (i.e. a lesson plan they have to write every two weeks and which they can't re-use anyway).

    Fuck knows what this guy was thinking, but there's no one one command ANYWHERE should be able to do that many actions, let alone dangerous actions that you haven't evaluated properly. Honestly, some of those machines don't even TURN ON until the backup window, and even the backup devices have rollback and shadow-copy-like functionality on top of whatever the backup software gives (incrementals, etc.). And several are DELIBERATELY offline for almost their entire lives and have entirely disparate credentials so no one command could ever affect them.

    Not being funny, but we're talking a small school of 400 5-14 year olds here. He actually has more customers than I have users. And you just can't fuck about like that, so if he thinks he can, I honestly have zero sympathy and can only laugh.

  13. Extremely timely article! by ErichTheRed · · Score: 2

    I just got put on a project at work as "the systems guy" for a project being built in Azure. This is in support of a reasonably critical system, and the development staff are salivating over the chance to self-deploy code and infrastructure. It sounds like this problem was caused by the first thing I noticed as a risk -- if you don't limit what Azure users can do, it's just like giving them the keys to the data center. And this isn't in an "evil BOFH control freak" sense, this is just the fact that everything in Azure is virtual and easily changed either manually or through automation. So, someone who's having a bad day could easily make a mistake and get rid of things they have permissions on -- it's possible in AWS too.

    It's a really different mindset than even a hosted IaaS service. There, if you do something stupid, at least the physical infrastructure doesn't get rolled up and carried off. Now hopefully you have backups if that happens and can just restore the VMs and storage as needed, but if developers are running the show I would highly doubt it. (In Marco's case, I would imagine this was caused by the classic "run as root, because I'm the boss" issue.

    So, in summary, all the (good) sysadmins worrying about the cloud taking their jobs need not worry. The rules of designing a safe computing environment have changed, but they haven't gone away entirely! I'd be a little worried if I were a savant-level EMC or Cisco guru right about now, but generalists with good heads on their shoulders are still in demand.

  14. Three steps by frovingslosh · · Score: 2

    Put backup copies in truck.

    Drive them to the backup site.

    Repeat regularly.

    --
    I'm an American. I love this country and the freedoms that we used to have.
    1. Re:Three steps by Waffle+Iron · · Score: 2

      ... and don't underestimate the bandwidth.

    2. Re:Three steps by Locke2005 · · Score: 2

      "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway." Andrew S. Tanenbaum

      --
      I've abandoned my search for truth; now I'm just looking for some useful delusions.
  15. ...and...?? by dentar · · Score: 2

    He admitted it publicly?

    --
    -- I am. Therefore, I think!
  16. rsnap is popular. Should pull from read-only accou by raymorris · · Score: 4, Insightful

    Rsnap is a very popular backup system which uses network mounted drive as it's default/most common configuration. I constantly remind people on the rsnap mailing list about the existence of cryptolocker type malware.

    A much safer way to do it is to have the backup system PULL backups using a read-only account. That way no command on the live system can touch the backups, and the backup system can't change anything on the live system - either accidentally or maliciously.

    One solid backup / hot spare system that does it safely by default is Clonebox.

  17. Re:Repeat after me... by NatasRevol · · Score: 2

    For most users the cloud is more reliable and more accessible than anything they'll ever be able to do in a SO/HO environment.

    The problem is that medium sized (1000+ employees) seem to think this too.

    And then have no backups, version control or anything else because some PHB said 'put it in the cloud, and stop arguing with me.' to the IT engineers who wanted local & remote backups, version controls, redundancy of hardware, network and power, etc, etc, etc.

    --
    There are two types of people in the world: Those who crave closure
  18. Chain of Mistakes by Greyfox · · Score: 3, Informative

    Recently the USPA was talking about stuff that kills skydivers. It's almost never just one mistake. It's a chain of mistakes where one single good decision anywhere in that chain would break the chain and prevent entirely preventable deaths. In the case of this story, if it had actually happened, which it didn't, the decisions made to violate best practices all along the chain (IE, running your bash scripts as root or as any user ID that has authority to delete anything on the file system, not pushing just pushing your backup data to isolated storage, not having numbered sequential backups, etc) would be so egregious that the story would simply be an example of Darwin at work. The conversation would go "Oh hey, did you hear about that guy who designed his system so badly that he was able to delete the whole fucking thing with one mistyped command? Yeah, the council of sysadmins voted to kill him. Said it was for the good of the species."

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  19. Old Saying by Tablizer · · Score: 4, Interesting

    "To err is human. To really fuck things up, you need a computer."

    I prefer that any bulk or query-based "delete" command ask for confirmation along with basic feedback. Example pseudo-code:

    > delete *:*.*

    You are about to delete 832 folders and 28,435 files.
    Your choices are:
          1 - Proceed with deletion
          2 - List path details about the above folders and files
          3 - Cancel deletion
    Your Choice: __

    (end of example)

    It may be slower and/or more resource intensive, but that's better than mass boo-boo's.

    An optional command parameter could switch off verification, but verification should be the default. This is something Unix/Linux gets backward in my opinion: the default should be confirmation mode, not the other way around. In other words, a command switch should be required to switch off confirmation rather than requiring a command switch to turn confirmation on.

    Typical SQL doesn't have a confirmation mode, so I usually do a verification query on the WHERE clause before running the actual:

    -- check
    SELECT count(*) FROM myTable
    WHERE x > 7 AND foo='BAR'

    -- actual, keeping same where-clause
    DELETE FROM myTable
    WHERE x > 7 AND foo='BAR'

    I also often inspect at least some of the actual rows, not just the count. Thus, as a rule of thumb, do random spot-checks of actual data, and a total count before final command execution.

    1. Re:Old Saying by hankwang · · Score: 2

      "This is something Unix/Linux gets backward in my opinion: the default should be confirmation mode, not the other way around."

      1. All Ubuntu versions and derivatives (and I think Centos/RHEL as well) alias rm to "rm -i" out of the box. Drives me crazy; with every install I have to hunt down whether those aliases were defined in .profile, .bash_profile, .bashrc, /etc/profile, /etc/bashrc, or somewhere in /etc/bash/*.

      2. Command-line tools that ask for confirmation suck for scripting. Especially if those prompts only occur under specific conditions (such as confirm overwrite).

  20. Re:rsnap is popular. Should pull from read-only ac by mlts · · Score: 2

    The best of all worlds is pull based backup software. However, the enterprise based programs are extremely pricy, well out of the range for a home user. The cheapest around would probably be Windows Fundamentals which is a descendant of Windows Home Server.

    What I've wound up doing on a small scale (this won't scale up past a few machines) is having a hardware NAS appliance. It had a samba share and account for every machine. The Windows boxes use Veeam to dump their data onto the individual shares. Every 15 minutes, the NAS pops a snapshot of each share, where several are kept for each hour/day/week/month/year, and the rest get tossed after a while. Every eight hours, the NAS backs itself up to an external HDD. This protects against ransomware in several ways. If ransomware just zaps the share, restoring the snapshot and bare-metal loading the machine isn't too bad. If ransomware takes its time and zeros files over an interval, because I have weekly, monthly, and backups over a duration, there is a good chance that I will still have the file around, either in a snapshot, or on the backup drive. Because each machine dumps to a separate share via a separate account, ransomware on one box can't destroy or access another machine's data.

    The ideal would be having the NAS maker writing an agent that sits on Windows and uses SSH or another time-tested protocol to pull backups. This would not just guarantee that backups are done, but are protected against ransomware.

  21. Re:Why is everything "trolling" to people like you by Dahamma · · Score: 2

    It was a pretty obvious troll if you read the whole thread:

    1. The guy claims to have made the most insanely improbable mistake to kill his entire set of servers. Possible, but unlikely. Most took the bait on this one.
    2. He had no explanation as to why "--no-preserve-root" didn't save him - basically looked like he didn't know about it, and he was lying.
    3. later on he responded to someone's suggestion to use dd to backup saying he reversed if and of - which is probably the second most joked about UNIX sysadmin error after "rm -rf".

    So, either you are pretty clueless about any of this, or, like another poster suggested, you are also a troll. Based on the specifics of your moronic post, probably the latter.