Too Perfect a Mirror

Lean how your tool works? by gweihir · 2013-03-24 01:24 · Score: 5, Insightful

Preferably, before using them? This sounds very much like plain old incompetence, possibly coupled with plain old arrogance. Thinking that using a version control system does absolve one from making backups is just plain old stupid. Then, with what I have seen from the KDE project, that would be consistent.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:Lean how your tool works? by maxwell+demon · 2013-03-24 01:36 · Score: 5, Insightful

Also, mirrors are not backups. Mirrors are intended to be identical to the original, so mirroring worked as expected. How should the software know that the removal of most repositories was not intentional?

--
The Tao of math: The numbers you can count are not the real numbers.
Re:Lean how your tool works? by gweihir · 2013-03-24 03:52 · Score: 4, Insightful

Yes, it is too much. How would the mirror operation ever know without full checks on everything? Quit asking for nanny-software that treats its users as incompetent and illiterate. Is it too much to ask for the admins to actually have a brief look at the description of the operation they are using as their primary redundancy mechanism? I don't think so. If they had done this very basic step, they would have known to run a repository check before mirroring. If they had any real IT knowledge, they would have known that mirrors are not backups and that you need backups in addition.
Also, from what I gather from their grossly incomplete "analysis" is that they had a file that read back differently on multiple reads (not sure, they seem not to have checked that), which is not a filesystem corruption (the OS checks for that on access to some degree), but a hardware fault. Filesystems and application software routinely do not check for that. It is one of the reasons to always do a full data compare when making a backup.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:Lean how your tool works? by vurian · 2013-03-24 05:21 · Score: 4, Interesting

"I would also like to point out that the incompetence and arrogance of the KDE team is quite visible once you investigate a bit of their history." Actually, if you would investigate the history of the KDE sysadmin team you would find out that this handful of volunteers are doing a job that many full-time, well-funded sysadmins cannot rival. And.. Anyone who talks about "the KDE team" as if it's a single, monolithic entity doesn't know what they're talking about.

Not git related by Rob+Kaper · 2013-03-24 01:24 · Score: 5, Insightful

This is not a problem with git --mirror: rsync or any other mirroring tool would end up in the same situation.

It's up to the master to deliver the goods and upgrading a master should include performing a test run as well as making a backup prior to the real upgrade. This was a procedural failure, not a software failure. But good to hear disaster was averted.

The 'K' stands for ... by Anonymous Coward · 2013-03-24 01:25 · Score: 4, Funny

You know, calling it a disaster really depends on your point of view.

No backups?! by Blymie · 2013-03-24 01:45 · Score: 5, Insightful

Good grief!

After all of that, not a single proposed solution is a proper, rotational backup.

This is what rotational backups are FOR. They let you go back months in time, and even do post-corruption, or post-cracking examination of the machine that went down!

Backups do *not* need to be done to tape, but a mirror or a raid card is NOT a backup. This is actually simple, simple stuff, and it seems like the admins at KDE are a bit wet behind the ears, in terms of backups.

They probably think that because backups used to mean tape, that's old tech, and no one does that.

Not so! Many organizations I admin, and many others I know of, simply do off-site rotational backups using rsync + rotation scripts. This is the key part, copies of the data as it changes over time. You *never* overwrite your backups, EVER.

And with proper rotational backups, only the changed data is backed up, so the daily backup size is not as large as you might think. I doubt the entire KDE git tree changes by even 0.1% every day.

Rotational backups -- works like a charm, would completely prevent any concern or issue with a problem like this, and IT IS WHAT YOU NEED TO BE DOING, ALWAYS!

Re:No backups?! by Blymie · 2013-03-24 02:17 · Score: 4, Insightful

A 24 hour old sync isn't a backup. It's a slightly delayed mirror.
"Rotational backups" isn't just a single thing. It's a whole ball of wax. Part of that ball of wax, are test restores. Another part of that are backups that only sync changes, something exceptionally easy with rotational backups, but not as was with a filesystem snapshot.
In 10 seconds, I can run 'find' on a set of rotational backups I have, that go back FIVE YEARS and find every instance of a single file that has changed on a daily basis. How does someone do that with ZFS snapshots? This is something that is key when debugging corrupt , or looking for a point to start a restore from (someone hacks in).
Not to mention that ZFS could be producing corrupt snapshots -- what an annoyance to have to constant restore those, then do tests on the entire snapshot to verify the data.
What I see here is a reluctance to do the right thing, and a desire to think that the way people do traditional backups is silly.
Re:No backups?! by gweihir · 2013-03-24 02:36 · Score: 4, Insightful

What really surprises me is that people still do not understand backup, after it has been solved for decades. Backup _must_ be independent. It _must_not_ be on the same hardware. It _must_not_ not even be on the same site, if the data is critical. It must protect against anything happening to the original system. Version control, mirrors, RAID, all do not qualify as backup. They are not independent of the system being backed up.
However, the amount of incompetence displayed in the original story and the comments here explains a lot. Seems that in this time of "virtual everything" people do not even bother to learn the basics anymore and are then surprised when they make very, very basic mistakes.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Re:No backups?! by Doc+Hopper · 2013-03-24 07:00 · Score: 4, Interesting

Unless there are legal reasons to keep 5-10 years of backups, or you are dealing in more then 3-5 TB of storage to be backed up, or taking things off-site daily via courier tape is just too expensive.
I like your summary of three important reasons for tape archive. I'll restate in different terms.
1. Mid-term to indefinite data retention.
2. Large quantities of data, where "large" is a value greater than a single hard drive can reasonably store.
3. Disaster recovery planning.
But there are more.
4. "Oops".
That's the category of this KDE git issue. Recovering from an "oops". People screw up. How do you recover? I'm a big fan of having multiple layers in that onion: online snapshots, near-line replicas, and off-line tape backups are a basic three-tiered framework for figuring out how to protect the data. I'm amazed as big as KDE is, they don't have storage/backup expertise helping them keep their data secure. Makes me think I may have found my next open-source niche to fill.
5. Reliability. Contrary to the "fragile, expensive" opinion above, tape failure rates are demonstrably lower than hard drive failure rates despite regular handling. Research left to the reader; hard drives fail at a rate about fifteen times higher than their rated MTBF, which was already considerably higher than tape. Data on tape is far more resilient than data on a hard drive.
6. Cost. If you have to store data long-term, consider tape. Administrative, electrical, power, cooling, and storage requirements are all cheaper.
That's what I can think of off the top of my head; I'm sure there are more reasons for tape to be a good choice. The reality for many people that want to store their data "in the cloud" also is this:
I back up your "cloud" storage onto tape drives. Your cloud storage is only as reliable as my ability to recover it from a disaster.

--
Matthew P. Barnson
I learn what I think when I read what I write

Re:Sounds like... by bmo · 2013-03-24 01:55 · Score: 4, Funny

There is nothing wrong with using the internet as a backup machine - with the caveat that you know what you're doing and you're using the right service/tool properly.

Personally, I have all my very important documents in an encrypted archive labelled "Area_51_Aliens_Proof.rar" with the note "It is dangerous for me to provide the key, but in the event of my death or imprisonment, a key will be provided EXPOSING EVERYTHING!!!" and uploaded to various paranormal bittorrent trackers and mirrored by various denizens of /x/.

I expect my documents to be archived in perpetuity.

--
BMO

No Git also failed by Anonymous Coward · 2013-03-24 02:03 · Score: 5, Informative

The files were corrupted, Git didn't report squat about the problems. The sync got different versions each time. Sure there are two layers of failure here, but one of them certainly is Git.

What he's saying is simple, Torvalds comment is not completely true:
"If you have disc corruption, if you have RAM corruption, if you have any kind of problems at all, git will notice them. It’s not a question of if. It’s a guarantee. You can have people who try to be malicious. They won’t succeed. You need to know exactly 20 bytes, you need to know 160-bit SHA-1 name of the top of your tree, and if you know that, you can trust your tree, all the way down, the whole history. You can have 10 years of history, you can have 100,000 files, you can have millions of revisions, and you can trust every single piece of it. Because git is so reliable and all the basic data structures are really really simple. And we check checksums."

He's saying that if the commits are corrupted:
"If a commit object is corrupt, you can still make a mirror clone of the repository without any complaints (and with an exit code of zero). Attempting to walk the tree at this point will eventually error out at the corrupt commit. However, there’s an important caveat: it will error out only if you’re walking a path on the tree that contains that commit. "

So there's a clear room for improvement. Sure the fault was a corrupt file, but the second layer of protection, Git's checking, ALSO FAILED. Denial isn't helpful here, Git should also be fixed.

Re:delayed update to servers.. by gweihir · 2013-03-24 02:22 · Score: 4, Informative

And another amateur-level solution. Does nobody know how to do backups anymore? O.k., here is the very basics of mandatory characteristics of a backup:

- Backup data storage independent of the system being backed up
- Several generation of backups kept for long enough to be absolutely sure you can recover (yes, that can mean years) and frequently enough that loss is acceptable.
- Expect that one backup generation can be faulty and ensure that even then, recovery is possible and data-losses are acceptable.
- Full disaster recovery possible, even if your original system is stolen by aliens.
- Disaster recovery is tested regularly
- Data is verified (full compare or 2-sided crypto-hash compare) on backup

This really is "IT operations 101". Forget about all these halve-ba(c)ked amateur stuff, IT DOES NOT WORK.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:But it is SUPPOSED to by gweihir · 2013-03-24 02:44 · Score: 4, Informative

Git does not have the magic "integrity check" on making mirrors. If they had bothered to look at the documentation they would have known. If they has thought about it for a second, they would have realized that expensive integrity checks might be switched off on a fast mirror operation. If they had even be a bit careful, they would have checked the documentation and known. They failed in every way possible.

Stop blaming the tool. This is correct and documented behavior. Start blaming the people that messed up badly.

And no, nothing done within the system being backed up is a backup. A backup needs to be stored independent of the system being backed up. Stop spreading nonsense.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:A thousand times. (Unless online mirrors roll b by gweihir · 2013-03-24 03:18 · Score: 4, Informative

I believe you are not talking about backup. A backup allows system recovery after a disaster and cannot ever be stored in the system itself. What you are talking about is availability improvement. That _can_ be part of the primary system. RAID, for example, exclusively serves this purpose (except RAID0). But backups must also protect against user and administrator error, software errors, the data-center burning down, sabotage, etc.

Replication is not the tool for that. The problem is that any data copy part of the system itself can be corrupted by the system as the system still has access to it. That is why a backup must be both removed from the system so it is independent, and allow full reconstruction, even if the original system is completely destroyed.

Now, improving uptime and reducing downtimes is important, but it is not what a backup does. A backup makes sure you do not lose your data permanently. What uptime improvement does is to make it less likely that you need to go back to the backup.

Or to put it differently, backup is for Disaster Recovery. Uptime improvement is for reducing DR cost reduction by reducing the probability of it becoming necessary and for reducing downtime cost.

I do agree to the political angle though.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.

Re:But it is SUPPOSED to by BitZtream · 2013-03-24 09:31 · Score: 4, Insightful

It is UNIX-style design where the user is expected to actually understand what they are doing.

No, it is not, and never was. It is infact the opposite of that. man pages, as one obvious example, are there so people who don't know what they are doing can figure it out. It is designed to be intuitive and provide you with the information needed to get the job done. It was built to have small, simple tools that were easy to understand. They can perform simple tasks on their own or when working together, perform some complex ones ... hence the powerful unix command line. The original UNIX design considered but new, inexperienced users and how to bring them up to speed as well as how to empower users with more knowledge of the system.

What you are referring to is a Linux/OSS attribute, not a UNIX attribute. Linux/OSS developers typically expect the user of the software to be a developer as well. This is the result of everyone scratching their own itch only and most code being written by people for themselves without any consideration of others. No one WANTS to write the things that makes it intuitive or easy for someone else who doesn't understand all the quirks. Obviously this isn't true for some of the paid developers, but the majority of them aren't.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager

16 of 192 comments (clear)