Slashdot Mirror


Too Perfect a Mirror

Carewolf writes "Jeff Mitchell writes on his blog about what almost became 'The Great KDE Disaster Of 2013.' It all started as simple update of the root git server and ended up with a corrupt git repository automatically mirrored to every mirror and deleting every copy of most KDE repositories. It ends by discussing what the problem is with git --mirror and how you can avoid similar problems in the future."

192 comments

  1. Lean how your tool works? by gweihir · · Score: 5, Insightful

    Preferably, before using them? This sounds very much like plain old incompetence, possibly coupled with plain old arrogance. Thinking that using a version control system does absolve one from making backups is just plain old stupid. Then, with what I have seen from the KDE project, that would be consistent.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:Lean how your tool works? by maxwell+demon · · Score: 5, Insightful

      Also, mirrors are not backups. Mirrors are intended to be identical to the original, so mirroring worked as expected. How should the software know that the removal of most repositories was not intentional?

      --
      The Tao of math: The numbers you can count are not the real numbers.
    2. Re:Lean how your tool works? by gweihir · · Score: 0

      Indeed. Mirrors, RAID, version control, all are _not_ backups. Anybody halfway competent knows that. The detailed analysis just shows these people had no clue. Well, maybe they will be a bit more careful and professional now.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    3. Re:Lean how your tool works? by Artifakt · · Score: 1

      You got in quick with a valid point, and completely shot yourself down with unsupported opinions. Why? Why say, in effect, "This is a proveably avoidabble mistake, and now I'm going to throw around vague hints of some totally unspecified complaint list, full of sound and fury, but signifying nothing in particular.", and so make everyone ignore the part that is both a defensible point and the only point actually pertenant to the article? Why shoot yourself in the foot like that?

      --
      Who is John Cabal?
    4. Re:Lean how your tool works? by Anonymous Coward · · Score: 0

      The file system became corrupted. Is it too much to ask that a mirror doesn't automatically copy damaged files? Shouldn't this be the simplest type of corruption to prevent?

    5. Re:Lean how your tool works? by gweihir · · Score: 0

      Huh? Since when do opinions need support? But I admit that I like from time to time I like to just check the effects of such statements. I have been at karma-cap for about 10 years now (except the one time where I kept pointing out their obvious reasoning errors to a bunch of religious nuts and lost all 50 karma points in a single thread), so there is really not a lot I will lose.

      I would also like to point out that the incompetence and arrogance of the KDE team is quite visible once you investigate a bit of their history. It is also relevant and related, as this incident indicates the opinion I have gotten from the outside was spot-on for this aspect of the project, making the assumption it is pervasive more likely.

      You may also have noticed that some person just saying "make backups" got modded down into oblivion. Sometimes the only thing that ./ moderation does is to show that many, many people with moderation points are fundamentally stupid.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    6. Re:Lean how your tool works? by gweihir · · Score: 4, Insightful

      Yes, it is too much. How would the mirror operation ever know without full checks on everything? Quit asking for nanny-software that treats its users as incompetent and illiterate. Is it too much to ask for the admins to actually have a brief look at the description of the operation they are using as their primary redundancy mechanism? I don't think so. If they had done this very basic step, they would have known to run a repository check before mirroring. If they had any real IT knowledge, they would have known that mirrors are not backups and that you need backups in addition.

      Also, from what I gather from their grossly incomplete "analysis" is that they had a file that read back differently on multiple reads (not sure, they seem not to have checked that), which is not a filesystem corruption (the OS checks for that on access to some degree), but a hardware fault. Filesystems and application software routinely do not check for that. It is one of the reasons to always do a full data compare when making a backup.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    7. Re:Lean how your tool works? by Anonymous Coward · · Score: 0

      We are talking about the mirror. There is more than one screwup here. The failure to actually have a backup is the first. The failure to run an intelligent mirror is the second.

      How would the mirror operation ever know without full checks on everything?

      Is a checksum too much to ask for (after each update)? Oh wait, git already does this. So all a mirror has to do is check it. Why shouldn't they have that functionality?

    8. Re:Lean how your tool works? by vurian · · Score: 4, Interesting

      "I would also like to point out that the incompetence and arrogance of the KDE team is quite visible once you investigate a bit of their history." Actually, if you would investigate the history of the KDE sysadmin team you would find out that this handful of volunteers are doing a job that many full-time, well-funded sysadmins cannot rival. And.. Anyone who talks about "the KDE team" as if it's a single, monolithic entity doesn't know what they're talking about.

    9. Re:Lean how your tool works? by TheRaven64 · · Score: 2

      In a traditional filesystem, yes. The mirroring happens at the block device level, and so it is completely unaware of the semantics of the filesystem and will duplicate anything, potentially overwriting good data with bad if the filesystem is corrupted. Worse, unless the drive fails catastrophically, you're liable to either duplicate single-block errors or to be unable to tell which copy of a block is the damaged one. ZFS fixes the second of these problems with block-level checksums, so it can tell which disk has errors. It also makes the mirroring infrastructure partially aware of the filesystem layout, so it shouldn't duplicate filesystem corruption, however it will happily copy user errors. For example, if your word processor corrupts a document as it saves it, then there's nothing ZFS can do about that (unless you have an earlier snapshot). And, of course, if there's a bug in the filesystem driver, all bets are off.

      Mirroring, as the grandparent says, is not a substitute for proper backups. One of the most common reasons for restoring from backups is accidental deletion. Even a filesystem with 100% reliability won't protect you against this.

      --
      I am TheRaven on Soylent News
    10. Re:Lean how your tool works? by osu-neko · · Score: 1

      Nothing controversial is obvious. If it was obvious, it wouldn't be controversial. If you think it's obvious, you've overlooking something. That certainty that an answer is obvious is, almost always, a sign of ignorance. Any time you find yourself wanting to use the word "obvious", you should quickly recognize that you don't fully understand the problem, and there are factors you're not considering. Continuing to press on makes you look stupid to the people who are aware of those factors and see you're not taking them into account.

      The main reason 90% of people appear to be "fundamentally stupid" to someone is that they don't understand what others are thinking, and don't see the alternatives. They're certain of why someone thinks something (and usually wrong), and get downvoted frequently and think they know why (and are wrong about that, too). This is why stupid people usually think they're geniuses. Everyone else looks stupid to them, since others can't see the "obvious" that is plain as day to them.

      I would also like to point out that the incompetence and arrogance of the KDE team is quite visible once you investigate a bit of their history.

      The opposite is also quite visible if you go looking for that. Any team, being composed of many individuals, is going to display a large variety of traits. There's hardly an assertion you can make about the nature of a team that you can't go dig up plenty of evidence for if you wish to. Like the best pseudo-scientific theories, evidence for them is always plentiful. Their problem lies not at all in a lack of evidence.

      --
      "Convictions are more dangerous enemies of truth than lies."
    11. Re:Lean how your tool works? by gweihir · · Score: 1

      Is is a bit more than "one" checksum. In fact a lot more. But you are perfectly welcome to run a full repository check (and that is what "checking the checksum" amounts to) after running the mirror operation. Oh, wait, that does not help in the scenario under discussion as the old mirror state is already gone at this time.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    12. Re:Lean how your tool works? by Anonymous Coward · · Score: 0

      Yeah, lean on that shit.

    13. Re:Lean how your tool works? by Anonymous Coward · · Score: 0

      Oh, wait, that does not help in the scenario under discussion as the old mirror state is already gone at this time.

      If the file system fucked itself (like it did), then the checksums will fail (since they were made when the last revision was added). An intelligent copy mechanism would say to not copy and overwrite the broken files. Copy to /tmp or a ramdisk and then overwrite when the checksums pass. This isn't rocket science.

      Is is a bit more than "one" checksum.

      It depends on the revision history. You don't need to run checksums on files that weren't modified.

    14. Re:Lean how your tool works? by socceroos · · Score: 1

      Mirrors should be backups. See Byzantine Fault Tolerance. A really good mirroring system would be properly BFT.

    15. Re:Lean how your tool works? by socceroos · · Score: 2

      Mirrors should be backups ideally. See Byzantine Fault Tolerance. A really good mirroring system would be properly BFT.

    16. Re:Lean how your tool works? by Anonymous Coward · · Score: 0

      Another stupid elitist rallying against computers doing their fucking job... If I wanted to do things manually I'd record the changes om parchment. I don't. So stop makimg excuses for lazy software writters who'd rather save time skiping testing routines and/or designing sane software that does the right thing by default just to feel superior for learning arcane shit through the magic of reading. I'm all for letting you shoot yourself in the foot mind you, it's just that aiming the gun at the foot with the safety off should be what you have read in the manual, not the other way around.

      TL;DR: Checking shit like this is menial work, the kind computers, not admind ought to be doing. Kernel hackers ain't doing DVC quite right.

    17. Re:Lean how your tool works? by gweihir · · Score: 1

      Git _can_ check the repository. It just does not per default when mirroring. That is the right behavior. If anybody want the check (and the massive slowdown that comes with it), they can run it after the mirror operation on the target and before the mirror operation on the source. Nobody is being lazy here. Mirroring with check and without check are just two different operations and the user has to select which they want and when they wan the checks.

      If you had read the article, you would have understood that. Instead you are projecting your angst and incompetence and demand nanny-ware that treats the user like a cretin. Pathetic.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    18. Re:Lean how your tool works? by gweihir · · Score: 1

      No, the filesystem did not "fuck itself". That results in a different error pattern. The checksums also did not fail. Have you even read the analysis? Do you actually understand Git checksums? Apparently not.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  2. Not git related by Rob+Kaper · · Score: 5, Insightful

    This is not a problem with git --mirror: rsync or any other mirroring tool would end up in the same situation.

    It's up to the master to deliver the goods and upgrading a master should include performing a test run as well as making a backup prior to the real upgrade. This was a procedural failure, not a software failure. But good to hear disaster was averted.

    1. Re:Not git related by Anonymous Coward · · Score: 0

      If emacs devs had that attitude they would never have implemented autosave.

      It should be made hard to well and truly destory datasets.

    2. Re:Not git related by Carewolf · · Score: 3, Insightful

      True, but git does have a mechanism for checking integrity, and the discussion here is where you should use the fast git --mirror which has no checks, and where the slower mechanism which does fits in.

    3. Re:Not git related by gweihir · · Score: 2

      Indeed. Git is blameless here. Git also is not a backup tool, you need backups in addition, just for cases like this one.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    4. Re:Not git related by gweihir · · Score: 3, Interesting

      You can --mirror any time. If you actually have backups, not just mirrors and hope.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    5. Re:Not git related by garyebickford · · Score: 1

      'destory' - I like that. It should be a word. :) Like the opposite of history? The removal of one's history. In fact, it applies rather well in this case. "The repository was destoried."

      --
      It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/
    6. Re:Not git related by Anonymous Coward · · Score: 0

      I pulled a Homer!

    7. Re:Not git related by Anonymous Coward · · Score: 0

      I agree. I'm more familiar with Windows, but my thinking here is screw the mirrors.

      What you really need is to make sure that your primary backups are not corrupted so that a restore is possible. Make a backup copy on a second system and run consistency checks before backing it up. That way you aren't affecting the primary system with a resource intensive process and you make sure that you have a good backup to restore from. You can always rebuild the mirrors after the restore process.

    8. Re:Not git related by Anonymous Coward · · Score: 0

      The main function of version control IS backup, and git has failed that task.

    9. Re:Not git related by sadboyzz · · Score: 1

      Yes. But silent data corruption is obviously a problem of the filesystem, ext4 in this case. Too bad btrfs is still years from stable.

    10. Re:Not git related by qbast · · Score: 2

      No, the main function of version control is ... version control.

    11. Re:Not git related by hobarrera · · Score: 1

      Git *can be used* as a backup tool, but doesn't mean that it *is* a backup tool regardless of how you use it.

    12. Re:Not git related by complete+loony · · Score: 1

      First problem, they have 1500 repo's and they want their mirrors to be able to delete them. But they built a script that deletes repo's if they are not found in a master list. Deleting because something is missing is always dangerous, something could disappear for any number of reasons. They should instead require an explicit delete command to be present.

      Second problem, a mirror doesn't fsck and their master copies don't either. They should fsck each repo periodically, and verify all new patches being pushed before they appear on the master server.

      They could also keep reflogs if they wish to guard against branches being deleted, but that wasn't a problem they actually faced.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    13. Re:Not git related by 31eq · · Score: 1

      What is this "git --mirror"? It's in the summary but not the original article. There are different git commands that have the --mirror option and they work differently.

      My simple experiment, with an old version of git, shows that "git clone" succeeds whether or not you specify --mirror, that "git pull" doesn't have a --mirror option (maybe because I'm out of date but http://linux.die.net/man/1/git-pull doesn't show it either) and "git push" fails (hangs, in fact) if it tries to push a corrupt blob.

      A discussion about whether you should use the safe method for backups wouldn't be very interesting because of course you should. The issue here is that they didn't know they weren't doing integrity checks and they may not be the only ones.

      Cloning new repositories rather than pulling to existing ones sounds risky to me. If that's what they were doing I don't know why. A forced pull ("git pull --force") will keep old branches and a reflog and so on. It will also, as it happens, abort if an integrity check fails. So this seems like good practice. But I haven't seen any discussions of good practice in backing up git repositories and I'll excuse anybody not knowing the clone operations didn't do the integrity checks. (Especially if they also saved tarballs, whatever the problems with those may have been.)

    14. Re:Not git related by TCM · · Score: 1

      ZFS exists in more than stable incarnations if you aren't too stubborn to force Linux on yourself but instead, use the right tool for the job. The problem with Linux users is, they think Linux _is_ the right tool for _every_ job until... they reinvent the proper tool, badly. Rinse and repeat.

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
  3. The 'K' stands for ... by Anonymous Coward · · Score: 4, Funny

    You know, calling it a disaster really depends on your point of view.

    1. Re:The 'K' stands for ... by gbjbaanb · · Score: 1

      nearly was, if KDE disappeared completely we'd all have to use Gnome... which would be a true definition of the word.

  4. RAID is not a backup by Anonymous Coward · · Score: 0

    Neither are online mirrors.

    1. Re:RAID is not a backup by gweihir · · Score: 1

      Indeed. Online snapshots are a different matter, but mirroring can never replace backups. Quite obvious in fact.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  5. A thousand times. (Unless online mirrors roll back by raymorris · · Score: 1

    A thousand times this. Say it with me - a mirror is not a backup. A RAID mirror is not a backup, a cluster mirror is not a backup, and a git mirror is not a backup.

    Unless of course the mirroring system integrates rollback to earlier mirrors, something like Clonebox for example.

  6. No backups?! by Blymie · · Score: 5, Insightful

    Good grief!

    After all of that, not a single proposed solution is a proper, rotational backup.

    This is what rotational backups are FOR. They let you go back months in time, and even do post-corruption, or post-cracking examination of the machine that went down!

    Backups do *not* need to be done to tape, but a mirror or a raid card is NOT a backup. This is actually simple, simple stuff, and it seems like the admins at KDE are a bit wet behind the ears, in terms of backups.

    They probably think that because backups used to mean tape, that's old tech, and no one does that.

    Not so! Many organizations I admin, and many others I know of, simply do off-site rotational backups using rsync + rotation scripts. This is the key part, copies of the data as it changes over time. You *never* overwrite your backups, EVER.

    And with proper rotational backups, only the changed data is backed up, so the daily backup size is not as large as you might think. I doubt the entire KDE git tree changes by even 0.1% every day.

    Rotational backups -- works like a charm, would completely prevent any concern or issue with a problem like this, and IT IS WHAT YOU NEED TO BE DOING, ALWAYS!

    1. Re:No backups?! by tangent3 · · Score: 0

      A git repository itself acts as a rotational backup...
      The article itself suggests ZFS snapshots of the git repository, which works just as well.

    2. Re:No backups?! by Carewolf · · Score: 1

      The very first proposed solution is a backup:

      One thing that will be put into place as a first effort is that one anongit will keep a 24-hour-old sync; in the case of recent corruption, this can allow repositories to be recovered with relatively recent revisions. The machine that projects.kde.org is migrating to has a ZFS filesystem; snapshots will be taken after every sync, up to some reasonable number of maximum snapshots, which should allow us to recover the repositories at a period of time with relatively fine granularity.

      So one 24 hour old backup, and a another machine saving backups of every single sync as ZFS snapshots.

    3. Re:No backups?! by Blymie · · Score: 3, Informative

      Git has no rotational backup ability in it. You can't do rotational backups of the machine, on the machine for starters!

      ZFS is not a rotational backup as well!

      Failure, 101, backups. Go back to school.

      Both of the above solutions do not prevent slow corruption, and they do not prevent issues where the machine is suspect. (Yes, ZFS can have bugs). They also do not help if the machine has been hacked into. They don't help if there is a fire, flood, or theft of the local box.

      Modern backup methodology has been developed over decades of people suffering JUST THROUGH THIS VERY THING. If you plan to just throw all that away, and pretend everyone doing backups is an idiot -- MAKE SURE YOU KNOW WHAT YOU ARE DOING.

      Because -- this very issue would not have been even a tiny concern, if proper, off machine, rotational backups were being done. And, if you aren't going to follow proper backup methodology, then you'd better sit down in a quite place for a few hours, and think of every possible disaster scenario, AND issues with the code you're going to be using for those backups.

      Hell, this whole KDE problem started, because the people using it did not even know how git works, 100%! Now, you're suggesting that using another tool, ON THE SAME BOX, is the answer? What will someone miss on ZFS?

      No, please, think about this more carefully.

    4. Re:No backups?! by gweihir · · Score: 1

      No, a git repository is not a backup. It is a version-control tree. Backups are always _independent_ of the working system and for very good reasons. Come on, people, this is beginner's stuff.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    5. Re:No backups?! by Blymie · · Score: 4, Insightful

      A 24 hour old sync isn't a backup. It's a slightly delayed mirror.

      "Rotational backups" isn't just a single thing. It's a whole ball of wax. Part of that ball of wax, are test restores. Another part of that are backups that only sync changes, something exceptionally easy with rotational backups, but not as was with a filesystem snapshot.

      In 10 seconds, I can run 'find' on a set of rotational backups I have, that go back FIVE YEARS and find every instance of a single file that has changed on a daily basis. How does someone do that with ZFS snapshots? This is something that is key when debugging corrupt , or looking for a point to start a restore from (someone hacks in).

      Not to mention that ZFS could be producing corrupt snapshots -- what an annoyance to have to constant restore those, then do tests on the entire snapshot to verify the data.

      What I see here is a reluctance to do the right thing, and a desire to think that the way people do traditional backups is silly.

    6. Re:No backups?! by Doc+Hopper · · Score: 3, Informative

      I do storage & backup for a living on an extremely large scale. Your post is correct in the main, except for this:

      You *never* overwrite your backups, EVER.

      You must overwrite tapes if you want to keep media costs reasonable. In our enterprise, we typically use $30,000 T10Kc tape drives with $300 T10K "t2" tapes. Destroyed/broken/worn-out media costs already eat the equivalent of several well-paid sysadmin salaries each year. Adding additional cost for indefinite retention is a huge and unnecessary cost.

      Agreed, though, this KDE experience isn't quite like that. Source code repositories commonly have 7-year-retention backups for SLA reasons with customers; most of my work deals with customer Cloud data, which kind of by definition is more ephemeral and we typically only provide 30, 60, or 90-day backups at most, in addition to typical snapshotting & near-line kinds of storage.

      No reasonable-cost disk-based storage solution in the world today provides a cost-effective way to store over a hundred petabytes of data on site, available within a couple of hours, and consuming just a trickle of electricity. But if you have a million bucks, a Sun SL8500 silo with 13,000+ tape capacity in the silo will do so. All for the cost of a little extra real-estate, and a power bill that's a tiny fraction of disk-based online storage.

      Tape has a vital place in the IT administration world. Ignore this fact to your peril and future financial woes.

    7. Re:No backups?! by Carewolf · · Score: 2

      More accurately the problem is that the hardware resources available to KDE are very limited and the KDE repository is one of the largest git repositories in the world. Back when subversion was the hot new thing, the thing that carried it forward was KDE because it was trying to migrate for SVN for several years before subversion was even capably of handling a repository that large. Git still can't remotely handle a project that large, which is why KDE is now split into a thousand different git projects.

      How often would you do do complete backups of KDE? How many would you save? How much hardware would that require? ZFS snapshots sounds like an ideal situation to handle the backups, since it can deduplicate. It does give another point of failure, but ZFS is pretty professional and high quality, and this is something it is designed to handle.

    8. Re:No backups?! by gweihir · · Score: 4, Insightful

      What really surprises me is that people still do not understand backup, after it has been solved for decades. Backup _must_ be independent. It _must_not_ be on the same hardware. It _must_not_ not even be on the same site, if the data is critical. It must protect against anything happening to the original system. Version control, mirrors, RAID, all do not qualify as backup. They are not independent of the system being backed up.

      However, the amount of incompetence displayed in the original story and the comments here explains a lot. Seems that in this time of "virtual everything" people do not even bother to learn the basics anymore and are then surprised when they make very, very basic mistakes.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    9. Re:No backups?! by drinkypoo · · Score: 2

      No reasonable-cost disk-based storage solution in the world today provides a cost-effective way to store over a hundred petabytes of data on site, available within a couple of hours, and consuming just a trickle of electricity.

      Lots of businesses (and most open source projects) are still dealing with only a couple terabytes of data or far less, and so they not only can but probably should use disk-based backups for reasons of both cost and convenience as nothing else will be cheaper, faster, or easier.

      Tape is now an enterprise-only thing, and good riddance.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    10. Re:No backups?! by gweihir · · Score: 1

      What I see here is a reluctance to do the right thing, and a desire to think that the way people do traditional backups is silly.

      There is psychological research into this: People making stupid decision often invest considerable effort in convincing themselves that the decisions are not stupid.

      My take-away message is that many, many slashdotters have a data-disaster in their future, as they do not understand what backup is for or how to do it so it actually fulfills its purpose.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    11. Re:No backups?! by Kjella · · Score: 1

      A git repository itself acts as a rotational backup... The article itself suggests ZFS snapshots of the git repository, which works just as well.

      That still smells like a single point of failure to me, because they didn't say anything about actually backing up those snapshots to another machine. So if you can crack this one root server, you can delete all the snapshots, corrupt all the projects and boom all the good copies are gone.

      --
      Live today, because you never know what tomorrow brings
    12. Re:No backups?! by Blymie · · Score: 1

      Yeah :(

    13. Re:No backups?! by Anonymous Coward · · Score: 0

      That's why tapes are worse in many scenarios. If you overwrite and reuse tapes they don't last as long - wear and tear.

      If you use HDDs you'd have more storage and the media comes with its own drive rather than costing 30k. And the capacity goes up as the tech improves without you having to buy a new super expensive tape drive.

      Also means you can get more bandwidth if you do it right.

    14. Re:No backups?! by TheRealMindChild · · Score: 1

      And with proper rotational backups, only the changed data is backed up

      I hate you. This is why I had a couple of orgs I worked at, when restore needed done, we had to start from two years ago, and then apply the changes from the backups going forward from the first. They're on tape? Even longer wait. A complete backup should be done if it can be. If not, it should be done on a regular basis.

      --

      "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
    15. Re:No backups?! by WuphonsReach · · Score: 1

      Tape has a vital place in the IT administration world.

      Tape is expensive, fragile and requires special hardware. Removable or external magnetic hard drives, OTOH, are cheap, sturdy and will work on any system that you can scrounge up.

      Given the costs of tape drives and tape media, it's not surprising that a lot of small / medium businesses just use hard drives for backups. External 2.5" 1TB drives are dirt cheap and you could do weekly off-site backups using them with 13 generations for less then $2000. You can't even buy a large capacity tape drive for $2000. Much less the tapes needed to run a proper backup cycle.

      Unless there are legal reasons to keep 5-10 years of backups, or you are dealing in more then 3-5 TB of storage to be backed up, or taking things off-site daily via courier tape is just too expensive.

      --
      Wolde you bothe eate your cake, and have your cake?
    16. Re:No backups?! by Blymie · · Score: 1

      Don't hate me. ;) Typically, you do a full backup every $x period of time.

      Trusting that your *only* full backup is good, isn't a great policy either. I tend to do full backups every quarter, but it depends upon the data set, and of course, the size of the data set. If the data set is trivial... then who cares? Do it weekly.

    17. Re:No backups?! by Anonymous Coward · · Score: 0

      A git repository itself acts as a rotational backup...
      The article itself suggests ZFS snapshots of the git repository, which works just as well.

      Uh.. by that logic an Oracle database is its own backup just by keeping log files around.

      You still need to back these up, Oracle, GIT, SVN, etc if only to get the data off the system primarily hosting it to protect against corruption or data loss.

      ZFS has checksums meant to protect against silent corruption in the underlying hardware. You could do point in time snapshots with it also, but you should STILL then take those snapshots offline somewhere else.

      Also, what is this "rotational" backup nonsense, there is a different kind? This forum is full of complete ignorance of proper backup methodology.

    18. Re:No backups?! by Anonymous Coward · · Score: 0

      A 24 hour old sync isn't a backup. It's a slightly delayed mirror.

      "Rotational backups" isn't just a single thing. It's a whole ball of wax. Part of that ball of wax, are test restores. Another part of that are backups that only sync changes, something exceptionally easy with rotational backups, but not as was with a filesystem snapshot.

      In 10 seconds, I can run 'find' on a set of rotational backups I have, that go back FIVE YEARS and find every instance of a single file that has changed on a daily basis. How does someone do that with ZFS snapshots? This is something that is key when debugging corrupt , or looking for a point to start a restore from (someone hacks in).

      Not to mention that ZFS could be producing corrupt snapshots -- what an annoyance to have to constant restore those, then do tests on the entire snapshot to verify the data.

      What I see here is a reluctance to do the right thing, and a desire to think that the way people do traditional backups is silly.

      First, stop calling backups "rotational". Multiple point in time copies of data is not an optional part of performing backups, it's the definition.

      Second, if you just browsed your backup sets with the FIND command, you are in NO POSITION to pick on ZFS snapshots, which BY THE WAY, is done by browsing /.../.zfs/, and record differences at the BLOCK LEVEL, with a staggering amount of checksums to prevent or alert on data corruption depending on your hardware.

      I can guarantee whatever ducktape & superglue job you are doing with rsync can be accomplished better with ZFS snapshots & ducktape & superglue.

      However, as a real life enterprise backup admin, I would recommend NEITHER.

    19. Re:No backups?! by Doc+Hopper · · Score: 1

      I agree with you, except for this part:

      If you use HDDs you'd have more storage...

      It all depends on the scale. If you're talking a small project with a small budget, I agree with you: tape backups are overkill, too expensive, and kind of pointless. Your average open-source project is usually just a few gigabytes at most. Use a snapshotting, journaling filesystem, always keep each version in at least three different places, create a retention policy that makes sense for you based on the needs of your project, and you're good.

      And you're right. Today's modern tapes are good for about 4,000 read/write cycles. Even if you get the tapes at a substantial discount, a 5TB+ tape is expensive to destroy!

      But when you are talking large enterprise data archiving needs, high-end hard drives do not compete with high-end tape drives in the slightest. And in today's risk-averse corporate climate, a reasonable disaster recovery strategy is a MUST, and providing multiple tiers of storage -- online, near-line, and off-line -- is attractive. 9/11 showed everybody how quickly DR plans can melt.

      I could go into a lot of specific numbers talking about how a few modern tape drives in a modestly-sized tape silo outperform similarly-sized hard drives for near-line storage in just about every category except random seek time by several orders of magnitude, but I'll leave that as an exercise for the reader. :-)

    20. Re:No backups?! by Anonymous Coward · · Score: 0

      Here's the problem with backups: You're still trusting software to not have bugs. If you have a tape library what prevents a bug in the library from overwriting the wrong tapes? If someone manually takes the backup media offsite what ensures that they always follow the correct procedure and that the procedure is not subtly flawed? If you test a restore from backup what ensures that the tests are comprehensive and would catch all possible cases of corruption? Building a resiliant system that doesn't lose data is a harder problem than "keep and test backups." I saw a few comments about how IT folks would question the way programmers use their mirroring and backup software. Well, who do you think wrote the backup software and utilities you use to restore and test your backups? Have you audited the source lately to ensure all your tools will actually function the way you expect them to?

    21. Re:No backups?! by Doc+Hopper · · Score: 4, Interesting

      Unless there are legal reasons to keep 5-10 years of backups, or you are dealing in more then 3-5 TB of storage to be backed up, or taking things off-site daily via courier tape is just too expensive.

      I like your summary of three important reasons for tape archive. I'll restate in different terms.
      1. Mid-term to indefinite data retention.
      2. Large quantities of data, where "large" is a value greater than a single hard drive can reasonably store.
      3. Disaster recovery planning.

      But there are more.

      4. "Oops".

      That's the category of this KDE git issue. Recovering from an "oops". People screw up. How do you recover? I'm a big fan of having multiple layers in that onion: online snapshots, near-line replicas, and off-line tape backups are a basic three-tiered framework for figuring out how to protect the data. I'm amazed as big as KDE is, they don't have storage/backup expertise helping them keep their data secure. Makes me think I may have found my next open-source niche to fill.

      5. Reliability. Contrary to the "fragile, expensive" opinion above, tape failure rates are demonstrably lower than hard drive failure rates despite regular handling. Research left to the reader; hard drives fail at a rate about fifteen times higher than their rated MTBF, which was already considerably higher than tape. Data on tape is far more resilient than data on a hard drive.

      6. Cost. If you have to store data long-term, consider tape. Administrative, electrical, power, cooling, and storage requirements are all cheaper.

      That's what I can think of off the top of my head; I'm sure there are more reasons for tape to be a good choice. The reality for many people that want to store their data "in the cloud" also is this:

      I back up your "cloud" storage onto tape drives. Your cloud storage is only as reliable as my ability to recover it from a disaster.

    22. Re:No backups?! by fikx · · Score: 3, Funny

      Hey, they had their backups setup....just switch some terms around and you can see how they actually DID have backups like they claim. sync happened every 20 minutes....so they kept multiple copies of one backup that was overwritten every 20 minutes. So, their window to detect and fix the issue before overwriting the backup is 20 minutes. no problem, right? What could possibly go wrong?
      :)

      --
      AB HOC POSSUM VIDERE DOMUM TUUM
    23. Re:No backups?! by jimicus · · Score: 1

      Backups have been a solved problem for decades.

      Understanding what does and does not constitute a reliable backup is not.

    24. Re:No backups?! by gweihir · · Score: 1

      I don't agree. Anybody bothering to find out will find what is required of a reliable backup. It does require some level of competence and the will to find out how to do it right. If you do not have one or the other, everything is lost anyways.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    25. Re:No backups?! by akozakie · · Score: 2

      And it's not going to get better. Read the comments at the site. Most of them are surprised that no backup procedure was implemented and most of the answers to those comments are "I'm telling you, there were backups! The mirrors. And if you mean old-school backup, that's not easy for a live git repository".

      They simply Know Better (TM). Discussion is useless, arguments are not even being parsed fully - the token "backup" throws an exception in their minds. They had the closest thing to a lose-it-all lesson you can get without... well... losing it all and they still do not see the problem. Sort of impressive, if you ask me. In a bad way, of course.

    26. Re:No backups?! by Anonymous Coward · · Score: 0

      If you overwrite and reuse tapes they don't last as long - wear and tear.

      If you write to your tapes only once and use new tapes for all your backups, you wear down the drives instead. If you keep feeding your drive with new tapes, you can forget about the MTBF advertised by the vendor. A drive wears down in a few months if it is constantly writing to new tapes.

    27. Re:No backups?! by jimicus · · Score: 1

      Maybe I should have worded that more clearly.

      Backups are a solved problem, but lots of people get hung up on having something that backs up the data with little consideration as to why you might want that.

      This is how you wind up with holes in your backup process.

      In this case, version control covers "oh dear I didn't mean to overwrite/delete that file"; mirroring covers "oh dear the server has just died horribly" and "oh dear the datacentre has been lost to fire". It appears nobody has considered "oh dear an unexpected combination of circumstances has corrupted the version control system and the mirroring has replicated that corruption".

    28. Re:No backups?! by Rich0 · · Score: 1

      Yup, it all starts with defining the failure modes you want to protect against, and appreciating what techniques protect against what modes.

      However, if you don't really appreciate all that stuff or can't be bothered to, then a safe default is rotational backup to offline media stored at a separate location from the device being backed up. There are plenty of services who make this sort of thing easy to do if you can't be bothered to get it right yourself - many are cloud-based and designed for clueless PC owners, and commercial solutions exist for Linux as well. There is no excuse for the maintainers of an FOSS project to not have an adequate backup solution for their source repositories.

      Mirrors and RAID don't involve rotation and that makes them effective against only a small number of failure modes. Those who think they're "good enough" for something like this don't have good imaginations. I live with RAID-only only for things that I can reproduce (such as re-installing, re-ripping, etc), or things I don't mind losing (like hundreds of hours of MythTV recordings). For anything with actual value I do rotational backups uploaded to an offsite location, with periodic testing (in particular anytime my backup software is updated). During testing I carefully peruse my backed up data so I have a pretty good appreciation for what I will and won't get back in the event of a disaster. These days decent backup is pretty cheap - there is no excuse for not having it where it matters.

    29. Re:No backups?! by Rich0 · · Score: 1

      If you can backup a live database you can certainly backup a live git repository.

      Stick it on zfs or btrfs, create snapshots, and then offline backup the snapshots. That's really easy to do, and atomic.

      Or just do a git clone or create a bundle and back that up offline. That is definitely atomic. Oh sure, you won't get every last commit, but this is a BACKUP, not a mirror, and you're not going to rotate them every five minutes. If you have to resort to backups you'll lose some commits, but at least you won't lose your last decade of history or whatever.

      Or you can offline the repository for a few minutes while you umount it, LVM snapshot it, and then remount it. That gives you a more defined recovery point for what it's worth, but at the cost of some downtime.

      Lots of ways to backup a live repository. Any of them are better than not having backups. They can still have mirrors - they're useful as well, but they solve a different set of problems.

    30. Re:No backups?! by Rich0 · · Score: 1

      More accurately the problem is that the hardware resources available to KDE are very limited and the KDE repository is one of the largest git repositories in the world.

      I doubt it is more than a few GB per backup compressed. If they rotated 20 backups using daily/weekly/monthly rotations they'd have a week of dailies and a year of less frequent backups. That would cost them all of maybe a few bucks a month even with retail-cost solutions like Amazon S3 - WAY less than they must be paying for their servers. They'd only need backups going back a few days to save themselves from a rapidly-detected disaster.

      Oh, and anybody could do a backup of KDE like this, assuming they allow anybody to clone their full repositories. A clone is a point-in-time snapshot, and then if you don't overwrite each snapshot with the next you have backups. It isn't like it takes special hardware to pull this off.

    31. Re:No backups?! by turbidostato · · Score: 1

      "Backups are a solved problem, but lots of people get hung up on having something that backs up the data with little consideration as to why you might want that."

      This and one thousand time this.

      I think the root problem is a perception one: I always tell the juniors: backups are nothing, I don't give a damn about backups. The important thing is restoration.

      It's marvellous how people improve with just that minor hint: that they think around the restoration process instead of the backup one.

      I.e.: with regards to this issue: well, my dear PYF, what will happen when you try to restore from a corrupted ball of bits and it happens to be your single ball of bits? answer: uh! I certainly will need multiple balls of bits that once settled never go to be touched again so, now that I think about it, sync'ing over the same repo once and again doesn't cut the butter.

    32. Re:No backups?! by lennier · · Score: 2

      Here's the problem with backups: You're still trusting software to not have bugs. If you have a tape library what prevents a bug in the library from overwriting the wrong tapes?

      I can see you've worked backup shift operator before. :)

      In my experience, tape backup software is just about the buggiest, cranky, least resilient piece of software I've had the displeasure of attempting to make half-work. There are so many ways an inventive tape jukebox can decide to fail (trying to backup an open database is a popular one). Pretty much if your backup completes at all, you can be sure it's because it didn't write what it was supposed to. If you're lucky it maybe wrote something to the log before it crashed!

      Oh, and testing by restoring? Onto the live production server which doesn't have an identical hot-swap live backup? Or even the one that does, but is in full-time use by the development team? Good luck getting permission to try that just to see if the backup system is working.

      Admittedly that was ten years ago. But I'm sure things have been fixed completely since then, just like security has.

      --
      You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
    33. Re:No backups?! by Electricity+Likes+Me · · Score: 1

      If we're going to start assuming ZFS is producing silently corrupted snapshots, we might as well start questioning is gzip is producing corrupted archives or md5sum is working correctly.

      Which are all valid assumptions, but ZFS is engineered around avoiding exactly these types of problems. It's not going to produce a silently corrupted snapshot because snapshots include their own checksums for the data coming off the disk, which is also checksummed and it's verified at every step of the way. The failure modes of ZFS always begin with a checksum failure of some sort.

      If you can run "find" on your backups, you can do it on ZFS with the .zfs/snapshots directory which provides exactly that functionality.

    34. Re:No backups?! by akozakie · · Score: 1

      I couldn't agree more. Hell, if they don't want to change their process, the mirror is also a nice backup tool. If they already have five, why not just set up a sixth? It doesn't have to be a powerful server, it won't handle any load. It's only function would be to take it down for backups at scheduled intervals between syncs. They don't have to change anything at the main server this way.

      If I feel I absolutely need to do offsite backup for my own private files, which are not crucial even to me (just a pendrive I carry, but still), how can they possibly think mirrors are enough for a large project? Knowledge about proper backup procedures was built on terabytes of lost data (in the times when a megabyte was a lot). They failed to learn from that history. And the shocking thing is, when it finally did repeat, they didn't learn from that either.

      Stupid. Stupid. Stupid.

    35. Re:No backups?! by Anonymous Coward · · Score: 0

      Absoutely agree.

      Git has no rotational backup ability in it.

      And why doesn't it? Because Git is not a backup tool, it's is a _collaboration_ tool!

      this whole KDE problem started, because the people using it did not even know how git works

      Or what is it for.

    36. Re:No backups?! by Doc+Hopper · · Score: 1

      A drive wears down in a few months if it is constantly writing to new tapes.

      That's what the cleaning tapes in your silo are for. The heads are typically good for millions upon millions of read/write cycles as long as they are kept clean. The motors driving the reels are typically of the brushless, multi-speed variety, capable of decades of reliable operation with quality bearings. Cleaning tapes, on the other hand, must be replaced regularly as they wear out.

      I frequently see bizarre claims on Slashdot, but the claim that using new tapes "wears down" your tape drive is a new and strange one. Where on earth did you get this idea?

    37. Re:No backups?! by Anonymous Coward · · Score: 0

      That's what the cleaning tapes in your silo are for. The heads are typically good for millions upon millions of read/write cycles as long as they are kept clean.

      What's your definition of a read/write cycle? If you mean millions of full tapes, then that would take centuries to accomplish. Don't expect any piece of hardware to last that long.

      Some libraries do automatically manage cleaning of drives. But that doesn't stop them from wearing down. The vendor also suggested that we tried cleaning the drives once the heads were worn down. And we did of course try it. We even went as far as doing it systematically. Of course it never helped, since the drive really was dying. The only reason for doing it was to speed up the support process, which inevitably would involve answering if the drive had been cleaned.

      the claim that using new tapes "wears down" your tape drive is a new and strange one. Where on earth did you get this idea?

      Experience from working at one of the world's largest consumers of backup tapes. With an operation that size it is possible to collect enough statistics about the life of drives and tapes to show beyond doubt that running enough new tapes through a drive will kill the drive.

    38. Re:No backups?! by Doc+Hopper · · Score: 2

      Absolutely fair comments, thanks for the information that new tapes have a higher cost on a tape drive than used tapes. I should have said "millions upon millions of feet of tape", which would have been a correct statement. I stand corrected.

    39. Re:No backups?! by Anonymous Coward · · Score: 0

      I should have said "millions upon millions of feet of tape"

      At a speed of 20 feet per second, it doesn't take long to reach a million.

  7. Sounds like... by eexaa · · Score: 1

    ...someone has been using Internets as a backup machine? :)

    1. Re:Sounds like... by bmo · · Score: 4, Funny

      There is nothing wrong with using the internet as a backup machine - with the caveat that you know what you're doing and you're using the right service/tool properly.

      Personally, I have all my very important documents in an encrypted archive labelled "Area_51_Aliens_Proof.rar" with the note "It is dangerous for me to provide the key, but in the event of my death or imprisonment, a key will be provided EXPOSING EVERYTHING!!!" and uploaded to various paranormal bittorrent trackers and mirrored by various denizens of /x/.

      I expect my documents to be archived in perpetuity.

      --
      BMO

    2. Re:Sounds like... by Anonymous Coward · · Score: 0

      An excellent idea. Get all the paranoid people together to make sure your backup is redundant forever.
      People who are paranoid enough to make sure they have a copy. Who think that hidden truths lie around the corner.
      It's not like they'd ever be able to crack the code. What could possibly go wrong? http://blog.zorinaq.com/?e=42

    3. Re:Sounds like... by bmo · · Score: 1

      You know, it was a joke, but Julian Assange has a file called "insurance" that is mirrored by a lot of people.

      1. Nobody has cracked the archive, even though the payload could be spectacular. It's not like nobody is trying.

      2. It really could just be his automobile insurance contract. Nobody knows.

      3. Sufficient key length and a strong algorithm /can/ stretch brute-forcing time into "end of the universe" length.

      --
      BMO

    4. Re:Sounds like... by CBravo · · Score: 1

      From what I was reading there is not much interesting in it... I already included the torrent in the TBL (torrent blacklist). Noone will be seeding it anymore.

      --
      nosig today
    5. Re:Sounds like... by bmo · · Score: 1

      > Noone will be seeding it anymore.

      This Noone guy really gets around because I hear about him all the time, even though I never see him.

      --
      BMO

    6. Re:Sounds like... by Anonymous Coward · · Score: 0

      Noone has blinded me!

    7. Re:Sounds like... by Electricity+Likes+Me · · Score: 1

      A file full of data from /dev/random will also accomplish this nicely (I have a copy of that file as well).

  8. Apropos by bmo · · Score: 0

    "With great power comes great responsibility" - Spider Man, issue #1.

    --
    BMO

    1. Re:Apropos by Anonymous Coward · · Score: 0

      Kids, come here, there's a guy who signs his posts with his user name, when said user name is already displayed on top of his post!
      That's a rare creature.

    2. Re:Apropos by bmo · · Score: 1

      The fact that this bothers you only strengthens my resolve to never change my signature.

      Have a great day.

      --
      BMO

    3. Re:Apropos by odie_q · · Score: 1

      With great power comes great responsibility" - Spider Man, issue #1.

      "Who said that? I'll kill them with my power!" - Homer Simpson, S19E03

      --
      ...ceterum censeo Carthaginem esse delendam.
    4. Re:Apropos by bmo · · Score: 1

      Quidquid latine dictum, altum videtur. - unattributed

      --
      BMO

  9. Re:A thousand times. (Unless online mirrors roll b by Anonymous Coward · · Score: 1

    Rollbacks are also not backups.
    Practice what you preach.

  10. Re:A thousand times. (Unless online mirrors roll b by Anonymous Coward · · Score: 0

    Common sense would dictate that git manages its own backup automatically anyways, so you don't need additional ones. Well, that didn't work out that great in this case.

  11. delayed update to servers.. by vanuda · · Score: 1

    Set up servers that gets a delayed update.. i.e 1 day delayed copy,1 week delayed copy and perhaps 1 month delayed copy.. Hopefully someone will notice an stop sync between servers before everything is gone.. Even if some part is lost.. Not all is lost..

    1. Re:delayed update to servers.. by gweihir · · Score: 4, Informative

      And another amateur-level solution. Does nobody know how to do backups anymore? O.k., here is the very basics of mandatory characteristics of a backup:

      - Backup data storage independent of the system being backed up
      - Several generation of backups kept for long enough to be absolutely sure you can recover (yes, that can mean years) and frequently enough that loss is acceptable.
      - Expect that one backup generation can be faulty and ensure that even then, recovery is possible and data-losses are acceptable.
      - Full disaster recovery possible, even if your original system is stolen by aliens.
      - Disaster recovery is tested regularly
      - Data is verified (full compare or 2-sided crypto-hash compare) on backup

      This really is "IT operations 101". Forget about all these halve-ba(c)ked amateur stuff, IT DOES NOT WORK.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    2. Re:delayed update to servers.. by RyuuzakiTetsuya · · Score: 1

      I haven't administered a git repo before, but, with something like git that has historical commit data, do you need more than say, a month or so of backup data?

      --
      Non impediti ratione cogitationus.
    3. Re:delayed update to servers.. by gweihir · · Score: 1

      It is very simple: Determine how long it will take to notice a problem in the very worst case, then make sure you have at the very least two full backups that cover this time.

      In most case that makes a backup of only a month gross negligence. If you run a full data consistency check every week, then having backups every week and keeping them for a month may be adequate if the project is not important. But what if, for example, you notice after 3 months that somebody hacked into the repository and changed things? Or hacked into some user's machine where the user has commit permissions? True, retroactively changing the history is not possible in git, but doing new changing is quite possible. So, are you confident you can reliably spot any malicious action withing less than a month _and_ stop the oldest backup from being overwritten before it is too late?

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    4. Re:delayed update to servers.. by Anonymous Coward · · Score: 0

      With a vast amount of people doing git pull --rebase (which fails in very ugly ways if teh upstream repository is manipulated) etc. there is a good chance it would be noticed fairly quickly.
      Plus, it will be hard to corrupt the copies of people who only do a git pull, so you probably have a lot of backups around, even though they will be very inconvenient to get and would rely on trusting those who provide the copies.
      Either way, you will probably be able to get away with a lot rarer backups for active git projects than for arbitrary data - especially if you regularly check the repository via "git fsck" for example. Though that doesn't mean I'd recommend it.

    5. Re:delayed update to servers.. by Todd+Knarr · · Score: 1

      Yes. Think about this: how do you recover the repository when the historical commit data is what's been damaged? Note that it doesn't have to be data corruption, although that's fairly common. One of the worst problems to recover from is human error, eg. an administrator makes a mistake cleaning up obsolete projects and permanently deletes more projects than intended, or makes a mistake on the filesystem itself and deletes the files associated with part of the repository. And yes you need more than a month's worth of backups to recover from that because sometimes the damage may not be apparent for months. I've got a project at work in version control that's incredibly critical, without it several major customers are totally off-line. Changes to it are very rare, measured in years per change, but when we do need changes to it they're high-priority (again the customer is totally off-line until the change goes in). If someone makes a mistake and wipes out that project it might literally be years before someone has a reason to look for the project and notice it's gone missing. If we only have a couple months worth of backups, what are we going to do?

    6. Re:delayed update to servers.. by 31eq · · Score: 1

      I believe a properly administered git mirror will do everything the GP wanted, except for testing the backups (which isn't git's problem) and the "2-sided crypto-hash compare" because, in my amateurishness, I don't know what that is. I do know that git does cryptographic hash checks on pull and push. I don't know exactly what checks are done when. I didn't know that "git clone" didn't do those checks until today and it seems that the KDE folks didn't either â" but, hey, we all do now. (From the update, "git clone --mirror" is what failed, but I believe the "--mirror" part is incidental. Should have been "git pull --all --force" as far as I know in my amateurishness.)

      I see other folks are less keen on discussing these issues than making condescending remarks about how to do backups. The update to the article goes in to a fair amount of detail about what they didn't do and why. It's almost as if they did think about these things.

    7. Re:delayed update to servers.. by Rich0 · · Score: 1

      I believe a properly administered git mirror will do everything the GP wanted, except for testing the backups (which isn't git's problem) and the "2-sided crypto-hash compare" because, in my amateurishness, I don't know what that is.

      The git mirror will overwrite the last backup with the next - it does not rotate media. Now, you can do a git clone to a new location every time and THAT might accomplish all the backup best-practices (assume you test).

      You need to rotate your backups because you can't assume that you'll detect a problem with your data before your previous backup gets overwritten. What happens if you find a glitch that goes back a month?

      The further back you want to go the more expensive things are, but in general you want to be able to go back at least a few days. For something as small as a git repository (even a big one) there is no reason not to rotate back at least a month or two.

  12. No backup of the KDE sources! by Anonymous Coward · · Score: 2, Informative

    They had/have no fucking backup! And complain about some git mirror issues. I can't fucking believe it that they can be so stupid.

    The solution: MAKE BACKUPS!

    1. Re:No backup of the KDE sources! by Sulphur · · Score: 1

      They had/have no fucking backup! And complain about some git mirror issues. I can't fucking believe it that they can be so stupid.

      The solution: MAKE BACKUPS!

      "Only two things are infinite, the universe and human stupidity, and I'm not sure about the former." - Albert Einstein

  13. Re:A thousand times. (Unless online mirrors roll b by Anonymous Coward · · Score: 0

    Also, a SCM is not a backup, not even git! Every software can fuck up.

  14. No Git also failed by Anonymous Coward · · Score: 5, Informative

    The files were corrupted, Git didn't report squat about the problems. The sync got different versions each time. Sure there are two layers of failure here, but one of them certainly is Git.

    What he's saying is simple, Torvalds comment is not completely true:
    "If you have disc corruption, if you have RAM corruption, if you have any kind of problems at all, git will notice them. It’s not a question of if. It’s a guarantee. You can have people who try to be malicious. They won’t succeed. You need to know exactly 20 bytes, you need to know 160-bit SHA-1 name of the top of your tree, and if you know that, you can trust your tree, all the way down, the whole history. You can have 10 years of history, you can have 100,000 files, you can have millions of revisions, and you can trust every single piece of it. Because git is so reliable and all the basic data structures are really really simple. And we check checksums."

    He's saying that if the commits are corrupted:
    "If a commit object is corrupt, you can still make a mirror clone of the repository without any complaints (and with an exit code of zero). Attempting to walk the tree at this point will eventually error out at the corrupt commit. However, there’s an important caveat: it will error out only if you’re walking a path on the tree that contains that commit. "

    So there's a clear room for improvement. Sure the fault was a corrupt file, but the second layer of protection, Git's checking, ALSO FAILED. Denial isn't helpful here, Git should also be fixed.

    1. Re:No Git also failed by gweihir · · Score: 3, Insightful

      Well, so this was _not_ a git failure, as there was an explicit warning that it does not cover this case. Not the fault of git but those that did not bother to find out. That a "mirror" operation does not check the repository is also no surprise at all.

      Incidentally, even if git had failed, that is why you have independent and verified backups. A competently designed and managed system can survive the failure of any one component.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    2. Re:No Git also failed by jankoh · · Score: 2

      Did you read the whole article?
      Even the part about "git fsck"?
      I just assume, that it was a design choice of Linus, NOT to run fsck each time, when performing let's say, mirror.
      Anyway, you can adjust just your sync scripts to include the fsck and carry on.
      (or better yet, run git fsck after each filesystem fsck???)

    3. Re:No Git also failed by garyebickford · · Score: 1

      I recently read an article (sorry, don't recall where) that said that Git was a 'functional data structure' akin to a functional programming language, and that was why it was so reliable.

      --
      It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/
    4. Re:No Git also failed by gweihir · · Score: 2

      Indeed. And it is absolutely no surprise that a fast mirror operation does not do a full consistency and data check. The most you can expect is a check whether data was copied correctly, and even for that you should check the documentation to make sure.

      Also, not knowing that backups are both mandatory and not somehow "automagically" done is basic IT operations knowledge. These people did not bother to find out and now blame git, when it is only their own lack of skill they have to blame.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    5. Re:No Git also failed by osu-neko · · Score: 2

      Indeed. And it is absolutely no surprise that a fast mirror operation does not do a full consistency and data check. The most you can expect is a check whether data was copied correctly, and even for that you should check the documentation to make sure.

      Also, not knowing that backups are both mandatory and not somehow "automagically" done is basic IT operations knowledge. These people did not bother to find out and now blame git, when it is only their own lack of skill they have to blame.

      Knowing that screw-ups happen is basic engineering knowledge. Competent engineers design fault-tolerant systems that don't fail spectacularly even when someone screws up. Yes, we understand, these people screwed up badly and are primarily to blame for the problem. This does not absolve git of any poor engineering decisions made that exacerbated the problem. A bad engineer says, "Ah, that person is to blame for causing this problem" and washes his or her hands of it. A good one says, "Ah, that person screwed up monumentally! Is there some way my tool could be improved to prevent screw-ups like that from resulting in a disaster?" You can't prevent all problems, but you shouldn't even be an engineer, software or otherwise, if you're the kind of person who doesn't even try. "Working as documented" is the poor engineer's excuse...

      --
      "Convictions are more dangerous enemies of truth than lies."
  15. Re:A thousand times. (Unless online mirrors roll b by gweihir · · Score: 3, Insightful

    No. Backup is out of scope for version control. Anybody with actual common sense would not expect it to make backups "magically" by itself and check to make sure. Then they would implement backups. But that does actually require said common sense.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  16. what almost became 'The Great KDE Disaster Of 2013 by Anonymous Coward · · Score: 1

    Isn't that what every major release is called? Except for the "2013" part?

  17. But it is SUPPOSED to by Anonymous Coward · · Score: 2, Insightful

    "Not the fault of git but those that did not bother to find out"

    No, Git has the integrity check, the integrity check didn't work. If the integrity check had worked as claimed then their backups were solid.

    I know people are saying "keep backups", but they're really missing the point. A backup is a copy of something, the more up to date the better, better still if it keeps a historic set of backups. Perhaps with some sort of software to minimize the size, perhaps only keep changes..... you can see where I'm going with this.

    Git sync to a lot of drives IS A BACKUP. It is exactly what an ideal backup should be, historic, up to date, minimizes storage. What is that system if it isn't an automatic backup!

    Except for this bug, which needs to be fixed, and a little less faith in git too would also be a good thing.

    It's really no different than if you use the backup software, and it made careful backups and kept historic copies, and then one day your disk got corrupted, you promptly went to your backups only to find the backup software had been chomping those because it didn't notice the integrity was corrupt and had happily been corrupting the backups it was keeping.

    So I see comments saying they didn't have backups OMG! But no, their problem was they only used ONE TYPE OF BACKUP SOFTWARE Git sync. I bet all of you use only ONE type of backup software and are equally vulnerable to this failure.

    1. Re:But it is SUPPOSED to by gweihir · · Score: 4, Informative

      Git does not have the magic "integrity check" on making mirrors. If they had bothered to look at the documentation they would have known. If they has thought about it for a second, they would have realized that expensive integrity checks might be switched off on a fast mirror operation. If they had even be a bit careful, they would have checked the documentation and known. They failed in every way possible.

      Stop blaming the tool. This is correct and documented behavior. Start blaming the people that messed up badly.

      And no, nothing done within the system being backed up is a backup. A backup needs to be stored independent of the system being backed up. Stop spreading nonsense.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    2. Re:But it is SUPPOSED to by vurian · · Score: 0

      Stop defending the tool. The tool is shit. Start praising the KDE sysadmins who are volunteers, all of them, and who are doing their job better than any professional sysadmin I've ever seen.

    3. Re:But it is SUPPOSED to by Anonymous Coward · · Score: 2, Interesting

      Git does not have the magic "integrity check" on making mirrors.

      Why on earth not?

      If they had bothered to look at the documentation they would have known.

      There's no mention of this in any of the git-clone, git-push, git-pull or git-fetch man pages on my system, at least not near any instance of the word "mirror".

      If they has thought about it for a second, they would have realized that expensive integrity checks might be switched off on a fast mirror operation.

      Why? The point of the mirror option (at least as far as the documentation mentions) is to propagate all branch additions/deletions/forced updates automatically, not to make it fast. Git is advertised as having strong integrity checking as a feature, so why would you assume that would ever be turned off, except maybe with an explicit --no-check-hashes option?

      If they had even be a bit careful, they would have checked the documentation and known. [...] This is correct and documented behavior.

      Not documented in any of the obvious places to look, at least. Maybe if they'd bothered to read literally the entire Git documentation they might have found a mention of this somewhere, but reading the entire documentation every time you start using a new option just in case there might be some special non-obvious caveat goes way beyond "even a bit careful".

      And no, nothing done within the system being backed up is a backup. A backup needs to be stored independent of the system being backed up.

      The whole point of the mirrors is that they're not the same system as the original.

    4. Re:But it is SUPPOSED to by Anonymous Coward · · Score: 0

      A quote from the man page:

      --mirror Set up a mirror of the source repository. This implies --bare. Compared to --bare, --mirror not only maps local branches of the source to local branches of the target, it maps all refs (including remote-tracking branches, notes etc.) and sets up a refspec configuration such that all these refs are overwritten by a git remote update in the target repository.

      So where're the warnings?

    5. Re:But it is SUPPOSED to by osu-neko · · Score: 2

      Stop blaming the tool. This is correct and documented behavior. Start blaming the people that messed up badly.

      This is a false dilemma. One can certainly blame the blameworthy behaviors of the people using the tool, while still pointing out that the tool itself could be improved. Yes, there are reasons why you might want a mirror operation to be as fast as possible, and even reasons why you might want to mirror a corrupted archive. There should be a flag for that, --skip-integrity-check or the like. Making that the default behavior, however, seems ill-advised.

      If they had bothered to look at the documentation they would have known.

      Yes, and they should have, and are to blame for not doing so. That said, documenting poor design doesn't make it good design.

      --
      "Convictions are more dangerous enemies of truth than lies."
    6. Re:But it is SUPPOSED to by gweihir · · Score: 1

      Stop defending the tool. The tool is shit. Start praising the KDE sysadmins who are volunteers, all of them, and who are doing their job better than any professional sysadmin I've ever seen.

      Well, you are certainly welcome to praise whatever you like and without any level of insighfullness. That does not give you any level of credibility though. Git is a (very reliable) specific tool with specific characteristics. It is decidedly and obviously not a tool for the incompetent. It expects its users to understand its data, security and reliability model. If your oh so competent KDE sysadmins cannot be bothered to even find out elementary things about a tool that is mission-critical for them, then I fail to find any level of professionalism there. Being a volunteer is not a valid excuse for a screw-up of this magnitude.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    7. Re:But it is SUPPOSED to by gweihir · · Score: 0

      Most clueless posting so far! I commend you! You do not even get the easy things, quite an accomplishment!

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    8. Re:But it is SUPPOSED to by gweihir · · Score: 1

      I do not agree that it is poor design. It is UNIX-style design where the user is expected to actually understand what they are doing. Sure, you could make it fool-proof, but that is decidedly not the UNIX-way as that would break things and because UNIX takes great care to not offend those that actually understand what they are doing. These people messed up through no fault of the git tool. Quit finding apologies for them. If they do not get it, they should have used some tool more on their level. Understanding how far your skills go is a critical skill fro any engineer.

      Also, git is not to blame for some amateurs not realizing that a real backup is non-optional.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    9. Re:But it is SUPPOSED to by BitZtream · · Score: 4, Insightful

      It is UNIX-style design where the user is expected to actually understand what they are doing.

      No, it is not, and never was. It is infact the opposite of that. man pages, as one obvious example, are there so people who don't know what they are doing can figure it out. It is designed to be intuitive and provide you with the information needed to get the job done. It was built to have small, simple tools that were easy to understand. They can perform simple tasks on their own or when working together, perform some complex ones ... hence the powerful unix command line. The original UNIX design considered but new, inexperienced users and how to bring them up to speed as well as how to empower users with more knowledge of the system.

      What you are referring to is a Linux/OSS attribute, not a UNIX attribute. Linux/OSS developers typically expect the user of the software to be a developer as well. This is the result of everyone scratching their own itch only and most code being written by people for themselves without any consideration of others. No one WANTS to write the things that makes it intuitive or easy for someone else who doesn't understand all the quirks. Obviously this isn't true for some of the paid developers, but the majority of them aren't.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    10. Re:But it is SUPPOSED to by vurian · · Score: 0

      Well, you don't have any level of credibility either -- making a bunch of posts doesn't give you that. What have you, actually, done to achieve credibility? Have you run a 1000+ repo, 1000+ developer project for free? Have you, before filling this slashdot article with your "I-know-best-these-people-are-morons" posts acquainted yourself with the actual situation at hand? I am sure you have not, and you have no excuse. You have not bothered to find out things that are elemental before commenting, and there is no excuse. What, exactly, have you done with your life that makes you fit to judge?

    11. Re:But it is SUPPOSED to by Anonymous Coward · · Score: 0

      I'm with you for the most part, but this line in particular is a poor design:

      "UNIX takes great care to not offend those that actually understand what they are doing"

      That's a coded excuse...

    12. Re:But it is SUPPOSED to by 31eq · · Score: 1

      I vouch for the parent. I checked the man pages and I don't see this behaviour documented. Those who think reading the documentation would have made the lack of integrity checks clear can easily point to the place it's documented. Then we can discuss whether a competent admin should have found it.

      (Note: I read the latest article, and they did have backups. So any posts about them trusting one tool and not having backups are irrelevant.)

    13. Re:But it is SUPPOSED to by Eythian · · Score: 1

      Everything he said is correct, however.

    14. Re:But it is SUPPOSED to by TCM · · Score: 1

      Git sync to a lot of drives IS A BACKUP. It is exactly what an ideal backup should be, historic, up to date, minimizes storage. What is that system if it isn't an automatic backup!

      A backup is not a backup if you overwrite it each time. They weren't doing backups at the file level, they were doing backups at the "Git level".

      Mirroring Git repositories is a form of backup, yes. But only if Git itself works flawlessly. A backup is a _disaster_ recovery mechanism. Disaster includes software failing. Using Git to fulfill the "historic" requirement of backups is a problem if Git itself fails.

      Looks like they trusted the Git adverts too much and thought complete integrity means complete integrity _on every access_ when in reality, it's nothing more than a ZFS fs that requires scrub to notice any corruption. In other words, one group of half-asses relied on another group of half-asses doing their job properly.

      _Always_ double-check and make sure you know how your tools work, especially under disastrous circumstances.

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
    15. Re:But it is SUPPOSED to by TCM · · Score: 1

      Uh, WTF?

      Following a path-of-least-astonishment approach is not automatically abandoning the KISS principle of UNIX or its ability to shoot yourself in the foot. First and foremost, a tool should do its job in a non-surprising, intuitive and (hopefully) properly documented way. Any ability to shoot yourself in the foot is just icing on the cake _if the tool does its main job properly_.

      Example: rm -fr / _should_ remove my whole filesystem tree, no questions asked, because the path of least astonishment requires that there are no edge cases or unforseen checks coded in. It's the same reason I can edit a directory entry with vi if I want to.

      However, if a tool is touted as fully integrity-checking its tree using SHA-1 checksum, I would assume it does this in a consistent manner on every operation. Otherwise, what's the point?

      Forcefully leaving the POLA to be able to shoot yourself in the foot is not UNIX. That's Linux.

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
    16. Re:But it is SUPPOSED to by Electricity+Likes+Me · · Score: 1

      I do not agree that it is poor design. It is UNIX-style design where the user is expected to actually understand what they are doing. Sure, you could make it fool-proof, but that is decidedly not the UNIX-way as that would break things and because UNIX takes great care to not offend those that actually understand what they are doing. These people messed up through no fault of the git tool. Quit finding apologies for them. If they do not get it, they should have used some tool more on their level. Understanding how far your skills go is a critical skill fro any engineer.

      Also, git is not to blame for some amateurs not realizing that a real backup is non-optional.

      Silently not doing integrity checks when your tool is built around that concept is bad design, plain and simple. Most Linux tools with potentially disaster causing command line switches also have secondary switches that make sure you understand the consequences - you can't tell hdparm to upload HDD firmware without also specifying --yes-please-destroy-my-data (or something like that).

      If git --mirror isn't going to do integrity checks - and I find it exceptionally odd that it doesn't - then that needs to be made very very obvious before you can use it. Man page documentation, and an extra switch saying --do-not-verify-integrity.

      If you had to do that, this problem wouldn't have happened.

    17. Re:But it is SUPPOSED to by Anonymous Coward · · Score: 0

      I'll go eith his since all you could muster was pointless snark without even touching on some of the issues you claim to be so simple.

    18. Re:But it is SUPPOSED to by Anonymous Coward · · Score: 0

      There is a fast mirroring tool, in proper unix fashion it is single minded, highly efficient and well understood. If git's mirroring is a bafly documented rsync it is their fault, and their fault alone that they pulled such a boneheaded duplicative, confusing move.

    19. Re:But it is SUPPOSED to by Zero__Kelvin · · Score: 2

      "No, Git has the integrity check, the integrity check didn't work."

      The integrity check worked perfectly. It said, in effect: "Yes, Mr. admin, this version is corrupted in exactly the same way as the original, which is I assume what you wanted since that is what you told me you wanted." Git is not to blame here. How is git supposed to know that you don't want a corrupted file in your repo? Maybe it is in there for testing purposes, for example.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    20. Re:But it is SUPPOSED to by Anonymous Coward · · Score: 0

      I do not agree that it is poor design. It is UNIX-style design where the user is expected to actually understand what they are doing.

      I thought UNIX design was one program did one thing, and preferably did it well. Nothing about simplicity or knowing what one is doing.

  18. Welcome to "rsnapshot" by Anonymous Coward · · Score: 2, Informative

    Rsnapshot provides cheap, userland hardlinked rotating snapshots work very well. Simply do the rsnapshots in one location, and three are dozen ways to make the completed, synchronized content accessible for download or other mirrors when the mirror is complete.

    The only thing I dislike about it is the often requested, always refused feature of using "daily.YYYYMMDD-HHMMSS" or a similar naming scheme, instead of the rotating "daily.0, daily.1, daily.2" names which are quite prone to rotating in mid-download for anyone accessing the snapshots via NFS or a web browser. The only way you can tell the rotations apart is by the timestamp on the top level directory, and that's very confusing when it rotates out from under you in mid-operations.

    1. Re:Welcome to "rsnapshot" by TCM · · Score: 1

      Hardlinks. Hardlinks!

      Let's tie our backup to the live data so we never know when either gets damaged and, as a bonus, damages the other as well!

      Hardlinks are nothing. They are not snapshots and certainly not backups.

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
  19. Re:what almost became 'The Great KDE Disaster Of 2 by garyebickford · · Score: 1

    Could be worse - Unity, Gnome 3, ...
    I'm playing this on KDE 4, trying it out. All I really want to do is run Compiz and some other stuff in my highly tuned environment - I use the Desktop Cube, with a transparent desktop, and Cairo Dock. I left KDE back about 6-7 years ago, but right now it's closer to what I want and am used to than anything else. I have Bodhi/Enlightenment running on another machine. It's nice too, but right now I'm like a man without a country.

    --
    It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/
  20. duh by sribe · · Score: 1

    Replicated systems need regular backups too. No shit, sherlock...

  21. btrfs by ssam · · Score: 1

    If only Linux had a filesystem that checksummed all you data, and check the checksum at every read. we could call it better FS, or something like that.

    1. Re:btrfs by TCM · · Score: 1

      Yeah, if only it had that, because it hasn't.

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
    2. Re:btrfs by ssam · · Score: 1

      WFM YMMV

  22. Moral of the story.... by Lumpy · · Score: 2

    you ALWAYS have incremental backups on MULTIPLE MEDIUMS.

    If you think your Git repositories are your backup, then you need to learn what the word Backup means.

    --
    Do not look at laser with remaining good eye.
    1. Re:Moral of the story.... by CBravo · · Score: 1

      It seems someone just did learn that aspect...

      --
      nosig today
    2. Re:Moral of the story.... by gweihir · · Score: 1

      Indeed. Form personal experience I can state that backup seems to require one (near) disaster before people take it seriously. This is called "experience" and there is no substitute for it.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    3. Re:Moral of the story.... by lennier · · Score: 1

      you ALWAYS have incremental backups on MULTIPLE MEDIUMS.

      Preferably one who specialises in Atlantean wizard-kings and another who does Egyptian or Indian priestesses.

      And several dozen offsite Sir Arthur Conan Doyles, if you want a belts-and-braces approach.

      --
      You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
  23. And the other thing by Anonymous Coward · · Score: 0

    "Anyway, you can adjust just your sync scripts to include the fsck and carry on."

    And what if the corruption occurs after the fsck and before the sync?
    The git sync shouldn't return OK if the commit object is corrupt. It's a bug, it needs fixed, no big deal, and no reason to defend a simple bug as though its a feature! Adding an fsck call is a temp workaround, but for solid faith in git this needs to be fixed.

    But also I think a healthy lack of faith in a backup software (even if git's making the backup) is important. How many of those nightly backups could be silently corrupted by a bug in the backup software! Your disk fails, you try the backups and ...

  24. programming != IT by SuperBanana · · Score: 1

    Most IT people would have said "Where are your backups?" When the programmers say "We're using mirrors", the IT person would say, "Where are your backups?" a second time.

    $50 says that whoever handles IT for KDE said "Hey guys, we need backups" and the programmers all said "Nah, we've got mirroring."

    Seriously: why doesn't an organization as large as KDE have backups? I understand if Safe the Fuzzy Wuzzies doesn't have good IT, but a major open source project?

    Always amazes me how I don't tell programmers how to do their job, yet I've had a decade and a half of programmers arguing with me about how to do mine. Which is particularly funny, since if the server under their desk dies, it's magically my fault/responsibility.

    1. Re:programming != IT by fa2k · · Score: 1

      I think the problem is that a code repository is very much a moving target. They didn't say whether they had backups, so they probably didn't and that's stupid, but it would also be a problem if they had a week old backup

    2. Re:programming != IT by vurian · · Score: 1, Informative

      "an organization as large as KDE have backups?" You mean one full-time secretary and a couple of volunteer sysadmins? That's how large KDE's support organization is. How much money do you think KDE has? It is less than 200k euros. That's how large the budget is -- and it has to pay for everything.

    3. Re:programming != IT by jeremyp · · Score: 2

      They should be backing up daily and, even if not, they should certainly have done a backup before doing a software upgrade.

      --
      All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
    4. Re:programming != IT by gweihir · · Score: 1

      Very good point. Many, many programmers do not get how to operate IT competently. The really good ones do, but they are rare.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    5. Re:programming != IT by BitZtream · · Score: 1

      If no one on their team is tasked with the responsibility of proper backups then you should read between the lines and take from that a lot more about the project itself.

      If the project doesn't do proper backups, a basic tenant of the computer world ... something they should have learned before they could code ... what exactly do you expect from the rest of the project?

      Your excuse is one typically said by the guy who was responsible but didn't do his job and is looking for petty excuses.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    6. Re:programming != IT by cr_nucleus · · Score: 1

      I think the problem is that a code repository is very much a moving target. They didn't say whether they had backups, so they probably didn't and that's stupid, but it would also be a problem if they had a week old backup

      I'd take old backups over no backups any day...

    7. Re:programming != IT by gweihir · · Score: 1

      Backup is non-optional. The only sane way to save cost by not having a backup is to scrap the whole project. It is one of the things that you _must_ find the resources for, no matter what.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    8. Re:programming != IT by akozakie · · Score: 1

      Most IT people would have said "Where are your backups?" When the programmers say "We're using mirrors", the IT person would say, "Where are your backups?" a second time.

      Read the comments under TFA and answers to them. You're 100% right, this is exactly what happens there.

      Wall, meet Head.

    9. Re:programming != IT by vurian · · Score: 3, Interesting

      Your remark is typically said by the guy who doesn't understand that a project like KDE is not an organization comparable to a Fortune 500 company. It is not a company. There are no employees. There is no significant income. Everything is done by volunteers. Everything. All of it. It is a large open source community, but it is not a company. There is no one responsible for telling anyone what to to do. There is no one who said "you have this budget", because there is no budget. This is completely outside your experience. There are no "they" who take care of things -- there is just an "us" -- and if you think your experience can be of use, you can be part of the "us", but you won't be paid, and every bit of hardware and bandwidth you use, you'll have to beg for. And it still works. Isn't that effing amazing?

    10. Re:programming != IT by Rich0 · · Score: 1

      FOSS projects smaller than KDE manage to back up their data. All it requires is that somebody cares.

      They have websites, they have mirrors, they have servers. They can afford backups. They likely have servers donated by companies that could back up those servers for them if asked. With git anybody can do a clone a day and rotate those clones, and those would be backups.

      I'm involved in a much smaller FOSS project with a much smaller budget, and yet we spend thousands of dollars a year on hardware/hosting/etc and we do backups.

    11. Re:programming != IT by lennier · · Score: 2

      Very good point. Many, many programmers do not get how to operate IT competently.

      Yes. And this is a problem.

      It leads to the atrocities that are the Adobe and Apple installers, among other things. Apparently an "application developer" these days doesn't need to trouble himself* with how his priceless treasures actually interact with the operating system they will be installed on. Because that's, like, the IT grunt's job? And anyway isn't some file copies and maybe a few registry hacks just a small matter of scripting, and not really coding at all?

      I'd like to dream that one day IT will be taught in computer science courses, with the same level of theoretical abstraction, and given the same kind of functional-programming toolsets that... well, haven't made it into mainstream "software engineering" either... but at least could get us all talking in the same room again. You know, like some lectures about how just tossing a bunch of files into a filesystem is sorta like coding in raw assembler in the 1960s where we had global variables for everything? And maybe couldn't there be a slightly smarter way of organising our lives so that we didn't....? And maybe how we could apply some of that "object oriented" and "functional" stuff that exists inside a running process, to the OS layer? At a slightly finer level of granularity than "spin up an emulated image of an entire server"? And maybe even the network infrastructure guys could have some kind of version control system for all the text config files for their DHCP servers and routers? Pretty please?

      Well, not next year. But maybe by 2030?

      * Theoretically that could be "herself", except that this level of arrogance/ignorance really does seem to be a uniquely male failure mode . Most females are smarter than to believe that they know everything about subjects they haven't learned.

      --
      You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
    12. Re:programming != IT by Anonymous Coward · · Score: 0

      and if you think your experience can be of use, you can be part of the "us"

      No, you can't. You'd need to be elected to a Council, nominated by a number of peers.

      You or I won't be able to be nominated, because we're from the Outside.

    13. Re:programming != IT by gweihir · · Score: 1

      * Theoretically that could be "herself", except that this level of arrogance/ignorance really does seem to be a uniquely male failure mode . Most females are smarter than to believe that they know everything about subjects they haven't learned.

      Matches my experience. Women in engineering rarely are not aware of their limitations, while men relatively frequently fall prey to this issue. One part is certainly upbringing, and one is that as a minority, women are actually aware that they need to initially work harder to be respected. On the plus side, that work pays off and makes them better engineers. On the minus side, those few women that are bad engineers are often really, really bad, because nobody dares telling them so or they believe criticism is just male chauvinism. I had the opportunity to observe that and it was tragic. Being female is not a valid excuse for being incompetent either.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  25. ZFS by fa2k · · Score: 1

    The article suggests using ZFS because of its protections against bad hardware.

    It implies that ZFS protects against bad RAM but *this is not the case*. The ZFS developers recommend using ECC memory.

    1. Re:ZFS by toby · · Score: 1

      If *ZFS* isn't proof against bad RAM, imagine how poorly conventional filesystems fare. ECC memory is advisable in situations demanding integrity anyway.

      --
      you had me at #!
  26. Re:A thousand times. (Unless online mirrors roll b by Antique+Geekmeister · · Score: 1, Insightful

    May I respectfully disagree? I've often seen such focus on what is "out of scope" used to limit cost and to limit the "turf" on which an employer or contractor needs access. But backup is _certainly_ a critical part of source control, just as security is. The ability to replicate a working source control system to other hardware or environments due to failure or corruption of the primary server is critical to any critical source tree. Calling it "out of scope" is like calling security "out of scope". By ignoring the consequences at the design stages of a source control system, very real risks are often taken without even thinking of the possible consequences, and the resources necessary to provide such critical features later can, and often do, multiply the cost of a project in unexpected ways.

    A nightly mirror on low-cost hardware with snapshot capability, for example, can provide very useful fallback capability. Even hardlink based softwaer snapshots can work well.. It requires thought to configure correctly, and to schedule the mirrors and make sure they don't conflict with other high bandwidth operations such as tape backup, and to handle "churn" diskspace requirements. And I've had some very good success with partners and clients who took such modest backup tools and saved enormous cost on high-speed tape backup systems high bandwidth connections for remote mirroring facilities, or who had difficulti4es meeting very short backup windows by using the mirror, or the snapshots, to do the tape backups for archival. It does inject a phase delay into the tape backups, and recovery from tape has to be tested, but it's been extremely effective.

    Several times, I've found that the problem is a political one. The backup system is often a very expensive, high performance capital cost, or some kind of proprietary "turf" of a manager who is very comfortable with and enamored of it, and they're concerned that adding this layer will make them look foolish for spending the money, or cost them their job as a proprietary owner of critical infrastructure. They already had the political battle purchasing the hardware in the first place and don't care to rehash their previous work. But it's often amazing what staging the backups this way can do for performance and user access to their backed up data. Most restoration cases are due to accidental file deletion or editing, and the users no longer need access to the tape backup system or off-site archival, and only to the snapshots which have read-only access with the same privileges as the original source material.

  27. IT AIN'T CALLED GIT FER NUTHIN !! by Anonymous Coward · · Score: 0

    Because, you know you are a redneck !!

  28. Three Letters: ZFS. by toby · · Score: 1

    Just use it. Write in place filesystems are obsolete from an integrity point of view.

    --
    you had me at #!
    1. Re:Three Letters: ZFS. by Anonymous Coward · · Score: 0

      Still not a replacement for backups. Backing up from ZFS snapshots may be a place to start, but there are tons of FOSS ways to accomplish this task.

    2. Re:Three Letters: ZFS. by BitZtream · · Score: 1

      Contrarty to what ignorant people like yourself think. GPL is not the definition of FOSS. FOSS in and of itself is a fucking retarded acronym since its actually talking about two different things, but that rant aside ... you have to be a completely ignorant moron to not realize that zfs is more open and free than your precious Linux kernel, and far more so than anything infected with GPLv3.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  29. So don't trust git by Anonymous Coward · · Score: 0, Troll

    "Git does not have the magic "integrity check" on making mirrors"

    Right, so, it returns OK (0), yet the commit may be corrupt, it hasn't walked the full tree, and it may corrupt all copies. Good job you warned me about this flaw! I know to stick with p4s!

    "Stop blaming the tool. This is correct and documented behavior. Start blaming the people that messed up badly."

    Your backup tool is taking backups of the corrupt archive. Keep it independent or not, its corrupted when you come back to it.

    The lesson here is not to trust one piece of software.

  30. Re:A thousand times. (Unless online mirrors roll b by Anonymous Coward · · Score: 1

    May I respectfully disagree? I've often seen such focus on what is "out of scope" used to limit cost and to limit the "turf" on which an employer or contractor needs access. But backup is _certainly_ a critical part of source control, just as security is. The ability to replicate a working source control system to other hardware or environments due to failure or corruption of the primary server is critical to any critical source tree. Calling it "out of scope" is like calling security "out of scope". By ignoring the consequences at the design stages of a source control system, very real risks are often taken without even thinking of the possible consequences, and the resources necessary to provide such critical features later can, and often do, multiply the cost of a project in unexpected ways.

    THIS.

    But while we're at it - from TFA: "The root of both bugs was a design flaw: the decision that git.kde.org was always to be considered the trusted, canonical source. The rationale behind this decision is relatively obvious; itâ(TM)s a locked-down, authenticated resource that runs customized hooks to validate the code being pushed to it. Itâ(TM)s perfectly reasonable to decide that it should be considered to be correct."

    Several times, I've found that the problem is a political one. The backup system is often a very expensive, high performance capital cost, or some kind of proprietary "turf" of a manager who is very comfortable with and enamored of it, and they're concerned that adding this layer will make them look foolish for spending the money, or cost them their job as a proprietary owner of critical infrastructure. They already had the political battle purchasing the hardware in the first place and don't care to rehash their previous work. But it's often amazing what staging the backups this way can do for performance and user access to their backed up data. Most restoration cases are due to accidental file deletion or editing, and the users no longer need access to the tape backup system or off-site archival, and only to the snapshots which have read-only access with the same privileges as the original source material.

    If, at the end of the day, we do what TFA suggests, and propose that one machine be considered "the" authoritative centralized source, we've just given the backup-dude/sysadmin his job back.

    The elephant in the room here is back in that section of TFA that refers to "the trusted, canonical source."

    Congratulations, now that you've migrated from git, you discover you still need something that functions as the "centralized" part of a centralized version control system. There are many reasons to argue for DVCS over centralized, but eliminating big iron central server and the concept of backups "because the source is on everybody's laptops!" isn't one of them.

  31. Re:A thousand times. (Unless online mirrors roll b by gweihir · · Score: 4, Informative

    I believe you are not talking about backup. A backup allows system recovery after a disaster and cannot ever be stored in the system itself. What you are talking about is availability improvement. That _can_ be part of the primary system. RAID, for example, exclusively serves this purpose (except RAID0). But backups must also protect against user and administrator error, software errors, the data-center burning down, sabotage, etc.

    Replication is not the tool for that. The problem is that any data copy part of the system itself can be corrupted by the system as the system still has access to it. That is why a backup must be both removed from the system so it is independent, and allow full reconstruction, even if the original system is completely destroyed.

    Now, improving uptime and reducing downtimes is important, but it is not what a backup does. A backup makes sure you do not lose your data permanently. What uptime improvement does is to make it less likely that you need to go back to the backup.

    Or to put it differently, backup is for Disaster Recovery. Uptime improvement is for reducing DR cost reduction by reducing the probability of it becoming necessary and for reducing downtime cost.

    I do agree to the political angle though.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  32. Re:A thousand times. (Unless online mirrors roll b by gweihir · · Score: 1

    Oh, and I should say that backup is very much in scope for a version control system installation! (We do nightly full and hourly incremental backups, for example.) It is just not in scope for the version control system software itself, as it solves a different problem.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  33. Rotation backups wouldn't fix it! by Anonymous Coward · · Score: 0

    Except they suspect the corruption was there a long time unnoticed and so your rotation copies have the corruption too! Worse, because its rotational, sooner or later the oldest one has gone....

    Really, you're putting your faith in MAGICSOFTWAREBACKUP, and saying "well Git mirrors aren't proper mirrors", except they ARE proper mirrors and they do keep historic backups! That what distributed server versioning software *IS*, it too never overwrites old versions, it too only stores differences, it too only syncs the differences, it too is physically distributed among many machines and locations!

    The problem here, is git has a flaw, and your MAGICSOFTWAREBACKUP could equally have a flaw. Perhaps it's not copying files ending in _fred, who knows, software is software, bugs are bugs! Don't assume your software (whatever it is) that describes itself as backup software is somehow less problematic than a git sync!

    I hate incremental backups (the kind you describe) particularly because I've had a corrupt root file and couldn't recover from a backup. I had 2 months of data back, even if the backup had worked, it would still have been a disaster to lose more than 2 months.

    IMHO this is a simple git bug, the synch'd copies were not only corruped BUT NOT EVEN IDENTICAL, so there's clearly a problem here. Oh well, software is software, find the bug fix it, and don't rely on one type of backup, ever again, even your rotational backups.

    A git sync to multiple machines, plus a second type of backup is the way to go. The git mirror counts as one type of backup, you need another type, some other software some other way. It could be rotational backups, it could be as simple as filecopy on a cron job, it could be a second versioning server, (e.g. a Perforce repo mirrored from git ), but some *second* backup strategy.

  34. Re:A thousand times. (Unless online mirrors roll b by Anonymous Coward · · Score: 0

    But backup is _certainly_ a critical part of source control, just as security is.

    Interesting example, given that git also doesn't do security or authentication (hence the need for gitolite)

    It was, shall we say "surprising" to discover that having commit access to a git repository allowed you to delete the history of other peoples' work.

  35. Re:A thousand times. (Unless online mirrors roll b by gweihir · · Score: 1

    There are many reasons to argue for DVCS over centralized, but eliminating big iron central server and the concept of backups "because the source is on everybody's laptops!" isn't one of them.

    Well, sort of. If they had done full repo updates on the "mirrors", this issue would likely not have happened. The core problem was that they did el-cheapo mirroring without understanding what the consequences are. They would still have to do full checkouts and detach them afterwards to make them proper backups. After all, the git software could have flaws. So while it does not need to be a "big iron central server", setting up several systems specifically doing backups is non-optional. In a sense they will be "central" systems then.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  36. Should have gone with windows.. by Anonymous Coward · · Score: 0

    Doesn't everyone knows about the file system corruption that happens often on linux ext4 formatted systems?

    Oh well.. I guess the neckbeards have been successful in blaming the victims instead of ext4 devs.

  37. Re:A thousand times. (Unless online mirrors roll b by vurian · · Score: 1

    Especially backup software.

  38. Same problem as real mirrors. by weazzle · · Score: 1

    The real mirrors in my house are also too perfect. Reflecting precisely what I put in front of them, rather than what I want to see. What they need is a copy-on-write file system for their source code servers, not an adaptive mirror.

  39. is this another example of the common mistake? by fikx · · Score: 1

    Do we have yet another case of someone who makes an IT related product thinking they are IT? The mistake highlighted by the article and a lot of the comments thinking version control = backup remind me of the many time some vendor tried to sell an IT product to a company while in my mind the whole time the developer or consultant are talking I keep yelling "you don't get IT, you are not IT, go talk to YOUR IT back at your company...you know, the guys that pull their hair out every time you trash your PC installing dev tool de jour"
    developer != IT .

    --
    AB HOC POSSUM VIDERE DOMUM TUUM
    1. Re:is this another example of the common mistake? by gweihir · · Score: 1

      I think it is. Very good developers do get it, but they are rare. The others think they get it but have no clue. Which makes them dangerous.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  40. response from a core Git developer by nluv4hs · · Score: 3, Informative
    Jeff King responded on Git's mailing list:

    Jeff King at 2013-03-24 18:31:33 GMT
    propagating repo corruption across clone

    "So I think at the very least we should:
    1. Make sure clone propagates errors from checkout to the final exit code.
    2. Teach clone to run check_everything_connected.

    "

  41. Update... by Curupira · · Score: 1

    Jeff Mitchell tried to respond to the criticism in Hacker News (a bit similar to the criticism made in Slashdot) in this post on his blog. I don't think he's successfully answered everything said here, but it is good to read his rationale.

    1. Re:Update... by Todd+Knarr · · Score: 1

      I think the main mistake he made is in thinking his mirrors were backups. He wasn't doing a live mirror, but what he was doing had one thing in common with mirrors: that the operation modifies the target. A proper backup would not have modified the target, it would've created a new target. That way if your source gets corrupted, doing the backup doesn't corrupt the target and you can recover from older versions of the backup.

  42. Re:A thousand times. (Unless online mirrors roll b by Rich0 · · Score: 1

    Now, improving uptime and reducing downtimes is important, but it is not what a backup does.

    Well, a backup does contribute to reducing downtimes, albeit not so much as RAID/etc. Compared to doing a full reinstall/reconfiguration restoring from backup is likely to be much faster. That is why backup can be useful even on systems that do not contain unreproducible data. There are other strategies that have other advantages (like automatic builds/etc) which are also effective if there is no data involved.

    I do agree that the primary purpose of a backup is to prevent the loss of data in as many failure modes as possible/practical. Mirrors are definitely not backups (or at least, not very good ones - there is a continuum when it comes to backups, just as there is a continuum when it comes to disasters).

  43. Re:A thousand times. (Unless online mirrors roll b by turbidostato · · Score: 1

    "backup is _certainly_ a critical part of source control"

    Well, no, it isn't. Backup and version control certainly share some attributes: a history line, the ability to extract snapshots along that history line... but they go appart on other things (or else there wouldn't be specialized version control software: we all would be using backups for that).

    The most obvious thing needed for backups that is not needed for version control is -despite the fact that you yourself seem not to understand it, is that on backups the historical snapshots need to be disconnected one from another, while that's not the case for version control. And that's the case for backups because as soon as you have any link among history points you can't guarantee the integrity of any one of them and so you lose one of the most needed abilities of a proper backup system: the ability to get to that snapshot as it was in the past. Version control expects a properly running system and rightly so; that's called separation of concerns and it is a good thing.

    That means, for instance, that no, hardlinking is not a proper backup policy (by itself only) nor is rsync, nor is filesystem-level snapshotting. Tapes, on the other hand, do allow for proper backups because the contents of one backup set are totally disconnected from the contents of another one so any failure, tamper, or disaster on one of them doesn't automatically affect others (but certainly tapes is not the only way to achieve that goal).

    At the highest level is not hard to come up with a proper backup design, no, really:
    * At least two whole copy sets of the data to protect
    * At least one of them totally disconnected from the system to be protected.
    * At least one copy of the recover procedure documentation outside the system to be protected.
    * At least two persons in the know of the procedure and a third about where to find the documentation. Make them not to work together at the same place.
    * Finally, remeber that if you didn't try to recover data from it, you don't have a backup.

    Now, go back that list and tell me if what the KDE people was doing fits the definition.

    OK, I'll answer this for you: No, it doesn't. The mirroring script coupled all and every copy along there whole history path, so it isn't a backup.

    See? Not so difficult.

  44. Several Gb, daily by raymorris · · Score: 1

    How often would you do do complete backups of KDE? How many would you save? How much hardware would that require?

    TFA says they have several GBs of data. Something like 89 GB. Since that's a rounding error to us, we volunteered to donate the necesary space. (EACH of our storage units for our backup service is at 14 TB, so donating 89 GB X 4 copies is nothing.)

    You asked how often - most web servers we do daily. For their case, I'd probaly do the same as my desktop - daily off site, and four times per day lical snapshot.

  45. Not talking about Walmart rollback pricing by raymorris · · Score: 1

    It sounds like you have in mind rolling back to something very specific, something which is perhaps not a backup. What I'm talking about must certainly is a backup. I'm talking about rolling back to an offsite image made last month, last week, yesterday, or this morning.

  46. Read the documentation by phorm · · Score: 1

    " If they had bothered to look at the documentation they would have known"

    So you read and familiarize yourself with the entire documentation on every piece of software you use?

  47. Learn from Google by GuB-42 · · Score: 1

    Two years ago, Google erased the contents of hundred of thousands of GMail accounts. It was caused by a bug and corruption spread through their network even though it is normally highly redundant and fault tolerant.
    The result : a few hours to a few days of downtime for the affected accounts and almost no data loss.
    How did they manage to avert a disaster ? They had proper backups, on tapes.

  48. I'm wondering... by Anonymous Coward · · Score: 0

    I'm wondering whether the people commenting here have actually read the article the post links to. It's well explained there why part of the fault definitely lies with git, and why making backups of a repository being changed all the time isn't as simple as just copying stuff with a cronjob. If the mistake with the repo list from the main server would not have been made (and this was the only real mistake they made), and git had actually worked as documented, then the mirrors would have been a perfectly reasonable backup solution in my opinion.

  49. Re:A thousand times. (Unless online mirrors roll b by GizmoToy · · Score: 1

    I don't know, I think I'd respectfully disagree. Those are all backups. If you have an online mirror and a fire destroys your primary data source or it's stolen, you can restore from the online mirror. This, having at least one fully copy of the data and being able to restore it after a loss, is the very definition of a backup.

    The problem is that mirrors are not very good backups, and are prone to having the same problems as the original. Using a mirror as a backup is perfectly reasonable. Using a mirror as your only backup is foolish.

  50. The most important... by i · · Score: 2

    From my 34 years of constructing, coding and maintaining applications on computers I learned by the hard way the 4 most important points:

    1. Backup.
    2. Backup.
    3. Backup.
    4. The rest.

    --
    Mundus Vult Decipi