GitLab Says It Found Lost Data On a Staging Server (theregister.co.uk)

← Back to Stories (view on slashdot.org)

GitLab Says It Found Lost Data On a Staging Server (theregister.co.uk)

Posted by msmash on Thursday February 2, 2017 @04:00AM from the reviving-again dept.

GitLab.com, the wannabe GitHub alternative that went down hard earlier this week and reported data loss, has said that some data is gone but that its services are now operational again. From a report The Register: The incident did not result in Git repos disappearing. Which may be why the company's PR reps characterised the lost data as "peripheral metadata that was written during a 6-hour window". But in a prose account of the incident, GitLab says "issues, merge requests, users, comments, snippets, etc" were lost. The Register imagines many developers may not be entirely happy with those data types being considered peripheral to their efforts. GitLab's PR flaks added that the incident impacted "less than 1% of our user base." But the firm's incident log says 707 users have lost data. The startup, which has raised over $25 million, added that it lost six hours of data and asserted that the lost doesn't include users' code.

101 comments

Min score:

Reason:

Sort:

Infected by Anonymous Coward · 2017-02-02 04:02 · Score: 0

It is.
1. Re:Infected by Anonymous Coward · 2017-02-02 10:02 · Score: 0
  
  Can somebody explain like I'm five: What's a staging server? Inquiring unix neckbeards want to know...
2. Re:Infected by Lisandro · 2017-02-02 23:29 · Score: 2
  
  It's a server (or set of servers) where you stage a new release of your site/software before an actual production release - it provides an environment as similar to prod as possible, and the idea is to help test test your release before unleashing it to the world.
3. Re:Infected by Anonymous Coward · 2017-02-02 23:45 · Score: 0
  
  Thank you. In the context of gitlab, my understanding is that they don't really "release" anything and hence have no reason to stage, right? I mean what's lost were issues, pull requests and the likes. This stuff is being staged?
4. Re:Infected by Lisandro · 2017-02-03 00:11 · Score: 1
  
  From what i gathered from the obscurely worded article, it seems that they tried to restore data from their staging server after their five backup systems failed. Staging servers require production-like data so it is common to keep them somehow synchronized with prod data (a database copy, for example), but it is kinda sad that's the only thing they had left by then.
5. Re:Infected by Anonymous Coward · 2017-02-03 07:32 · Score: 0
  
  Staging servers require production-like data so it is common to keep them somehow synchronized with prod data
  Right, that makes sense. Thanks.
Live by the cloud, by Anonymous Coward · 2017-02-02 04:02 · Score: 0

die by the cloud. I don't have too much sympathy for anyone that was relying solely on GitHub to keep their code safe and secure.
1. Re:Live by the cloud, by x_t0ken_407 · 2017-02-02 04:11 · Score: 1
  
  Or GitLab, even.
2. Re:Live by the cloud, by houstonbofh · 2017-02-02 04:11 · Score: 4, Insightful
  
  The hard part is having a backup plan for your "cloud." Some places make it easy, but some make it VERY hard. Never used gitlab so I can not comment... But if YOU do not have a backup, there are no backups. As Codespaces users found out, and now Gitlab, kinda...
3. Re: Live by the cloud, by Anonymous Coward · 2017-02-02 04:16 · Score: 1
  
  You can self host your own gitlab server and handle backup yourself if you want.
4. Re: Live by the cloud, by Anonymous Coward · 2017-02-02 04:27 · Score: 0
  
  Why the hell would you "self-host" a cloud service? Isn't the entire point of the "cloud" being that they take care of that crap for you?
  Or, you could switch to GitHub, which beyond being a service people have heard of before, has never lost data like this.
  Or you could just not use a service like GitHub at all since the entire point to Git is that you don't need a central server at all.
5. Re:Live by the cloud, by Anonymous Coward · 2017-02-02 04:29 · Score: 5, Informative
  
  GitLab is actually quite good at it, really.
  1. You can get all the wiki and code repo data by git cloning into a backup repository.
  2. You can set up a remote mirror that gets automatically updated for the code. I don't think you can do that for the wiki, though.
  3. Project admins can download a metadata dump to import in some other gitlab instance (e.g. a local instance of gitlab CE (floss) or EE (paid):
  The following items will be exported:
  Project and wiki repositories
  Project uploads
  Project configuration including web hooks and services
  Issues with comments, merge requests with diffs and comments, labels, milestones, snippets, and other project entities
  4. The data which is not exported (LFS objects, build traces and artifacts, container registry images) can be downloaded in some other way. E.g. LFS is usually cloned along with the git code repos.
  Note that (3) **includes** the webhooks data that was not fully recovered.
  So, yeah, anyone who lost truly important data in this gitlab.com event was actually just as guilty of not following the "Tao of Backup" properly as gitlab.com's sysadmins.
6. Re: Live by the cloud, by __aaclcg7560 · 2017-02-02 04:46 · Score: 1
  
  Why the hell would you "self-host" a cloud service?
  
  I switched from using DropBox in the cloud to a FreeNAS file server at home since I rarely access those files over the Internet. Now I don't have to worry about losing my data via the Internet.
7. Re: Live by the cloud, by twistedcubic · 2017-02-02 04:48 · Score: 3
  
  Why the hell would you "self-host" a cloud service?
  
  Almost any server can be "cloud service". There are several interesting solutions to the problem "I need to access a Git repository over the net" in "the cloud" or otherwise. For example, I self host because my code is so amazing, I can't risk having anyone see it lest they die from heart attack due to the overwhelming splendor.
8. Re: Live by the cloud, by Anonymous Coward · 2017-02-02 04:49 · Score: 1
  
  True. You've moved it to where you have to worry about losing data in a fire. Of course if you do offsite backups you are fine. The point is that moving from local to cloud or cloud to local is just shifting the potential failure point around. The solution of course is redundancy and multiple locations.
9. Re:Live by the cloud, by houstonbofh · 2017-02-02 04:53 · Score: 1
  
  I would mod you +1 Informative for that. Yes, automated backup of the cloud is key. You can not verify what you do not see.
10. Re: Live by the cloud, by hsmith · 2017-02-02 05:04 · Score: 1
  
  While we don't self host, we have two independent git repositories - Bitbucket and Github. the probability of both disappearing over night is pretty slim.
11. Re: Live by the cloud, by Midnight+Thunder · 2017-02-02 05:14 · Score: 1
  
  The worst offender is Apple's iCloud, IMHzo. Backup your photos onto your own drive: I can offers you hoops and dead-ends. I really feel cloud services should provide easy options.
  
  --
  Jumpstart the tartan drive.
12. Re: Live by the cloud, by gwolf · 2017-02-02 05:31 · Score: 3, Informative
  
  Being it a Git repository, you don't have to worry too much about your "centralized" hosting provider – Each developer that has cloned a (non-shallow) repository will locally have everything needed to rebuild history were both providers to disappear. Git is a great backup strategy by itself :-)
13. Re: Live by the cloud, by gwolf · 2017-02-02 05:32 · Score: 2
  
  Of course, forgot to add — this will *not* include comments, issues, the whole social ecosystem built around your code — but anyway, you don't get to backup it if you replicate your project over several different Git-hosting providers.
14. Re:Live by the cloud, by freeze128 · 2017-02-02 05:33 · Score: 1
  
  I read that in the voice of Snagglepuss.
15. Re: Live by the cloud, by tepples · 2017-02-02 05:35 · Score: 1
  
  "Private cloud" means you lease a VPS, such as an AWS EC2 instance, and install an application there. It's useful for keeping personal information within your own country.
16. Re: Live by the cloud, by The-Ixian · 2017-02-02 06:03 · Score: 1
  
  Isn't the entire point of the "cloud" being that they take care of that crap for you?
  It is a selling point, certainly. I wouldn't say it is the entire point.
  Other selling points are:
  1. Access to the data from anywhere
  2. Collaboration with internal and external users
  3. Cross platform availability (device agnostic)
  4. Simplified billing / accounting
  5. Broader spectrum of tools (example: you could buy just Word for, say, $100 and own that one program or you can get an O365 sub and rent SharePoint, Word, Excel, Outlook, Publisher, Access, Skype, Exchange, PowerBI, OneDrive and a raft of other software for $15/mo)
  6. Automatic updates and/or upgrades to new versions
  7. Access to enterprise level infrastructure without having to buy it yourself
  
  --
  My eyes reflect the stars and a smile lights up my face.
17. Re: Live by the cloud, by ncc74656 · 2017-02-02 06:09 · Score: 1
  
  Why the hell would you "self-host" a cloud service?
  
  There is no cloud...it's just someone else's computer. If you're not comfortable with your stuff on someone else's computer, that would be good justification for self-hosting. I have a FreeNAS box at home providing ownCloud, Plex, and some other services, as well as some Git repositories (currently without a web interface). Some of my Git repos (especially my Portage overlay) are at GitLab for public access (used to be at GitHub, but I yanked everything off of there after they became SJW-converged).
  Right now, the Git repos live within their own jail and are accessed over SSH. I tried bringing up GitLab on my server when it was running Gentoo, but didn't get very far...ISTR their packaging and install docs being somewhat Ubuntu- or Debian-specific.
  
  --
  20 January 2017: the End of an Error.
18. Re:Live by the cloud, by Anonymous Coward · 2017-02-02 06:23 · Score: 0
  
  I am confused, I thought all git users would always have a local copy. Couldn't you just push your changes back out to gitlab again?
19. Re: Live by the cloud, by Anonymous Coward · 2017-02-02 07:06 · Score: 0
  
  Why the hell would you "self-host" a cloud service? Isn't the entire point of the "cloud" being that they take care of that crap for you?
  Well, since some of them don't and there is no way for you to find out until you experience data loss there has to be some other point with cloud services.
  OTOH "getting money without doing the job" is a business model that is favored among bean counters.
  Outsourcing is a lot cheaper than doing things in house if you don't mind paying someone else to take your know-how and make you reliant on them.
20. Re: Live by the cloud, by houstonbofh · 2017-02-02 08:16 · Score: 2
  
  Yep. 3 2 1. https://www.backblaze.com/blog...
21. Re: Live by the cloud, by minstrelmike · 2017-02-02 08:23 · Score: 1
  
  Why the hell would you "self-host" a cloud service?
  Because in today's modern world, it pays to be fully buzzword-compliant.
22. Re: Live by the cloud, by pak9rabid · 2017-02-02 08:37 · Score: 1
  
  I have my personal git repo hosted locally on my LAN, and use Dropbox as a backup source, with a nighly cronjob packing it up and gpg-encrypting it before shipping it off to Dropbox. It's been working great for 5 years now.
23. Re: Live by the cloud, by networkBoy · 2017-02-02 09:35 · Score: 1
  
  Have you ever had to do a bare metal restore?
  If not I suggest you do it. That's part of what got GitLab, they never verified their backups were restorable. If they had they'd have found there was no data there.
  
  --
  whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
24. Re: Live by the cloud, by phantomfive · 2017-02-02 11:35 · Score: 1
  
  I self host because my code is so amazing, I can't risk having anyone see it lest they die from heart attack due to the overwhelming splendor.
  Best reason ever.
  
  --
  "First they came for the slanderers and i said nothing."
25. Re: Live by the cloud, by Anonymous Coward · 2017-02-02 12:22 · Score: 0
  
  Or, you could switch to GitHub, which beyond being a service people have heard of before, has never lost data like this.
  Yet...
26. Re: Live by the cloud, by Anonymous Coward · 2017-02-02 14:42 · Score: 0
  
  ...for security reasons (ITAR/EAR, etc.), your industry can't allow code to be outside the corporate firewall.
27. Re: Live by the cloud, by Anonymous Coward · 2017-02-02 23:32 · Score: 0
  
  SJW-converged? Really?
28. Re: Live by the cloud, by Anonymous Coward · 2017-02-02 23:50 · Score: 0
  
  I don't know. The chances of the west coast getting nuked are high these days...
29. Re: Live by the cloud, by Anonymous Coward · 2017-02-03 00:58 · Score: 0
  
  Do you even know what a BMR isÂ shutup.
30. Re: Live by the cloud, by whoda · 2017-02-03 03:39 · Score: 1
  
  Do you work or possibly, formerly worked, at Gitlab?
31. Re:Live by the cloud, by RockDoctor · 2017-02-04 16:11 · Score: 1
  
  but some make it [backup outside the cloud] VERY hard.
  There is a concept called various things, but most often "vendor lock in" ; it may limit your potential market to the idiots in your industry, but if you can get those idiots to accept it, you're on a road to permanent customers, plus they'll send their first born daughter (or son - your choice) round to service you when you want to empty your balls.
  Did you never see that big cheese-eating grin on Billy Gates face? Nerd paradise through vendor lock-in.
  
  --
  Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
Too Late by Anonymous Coward · 2017-02-02 04:03 · Score: 1

Reputation is ruined forever. Everyone involved will never work in tech again, should kill themselves right now.
1. Re:Too Late by Anonymous Coward · 2017-02-02 04:08 · Score: 0
  
  You should follow your own advice.
2. Re:Too Late by Anonymous Coward · 2017-02-02 04:14 · Score: 0
  
  Oh no, you're not tricking me into killing myself first, because you can't be trusted to kill yourself second.
3. Re:Too Late by AmiMoJo · 2017-02-02 04:15 · Score: 2
  
  On the other hand, having now made this mistake they are probably not going to make it again. Could be more reliably than companies which by chance have not needed to restore from backup yet.
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
4. Re:Too Late by Anonymous Coward · 2017-02-02 04:17 · Score: 0
  
  Did you use the same pitiful rationalization after you cheated on your wife?
5. Re:Too Late by bluefoxlucid · 2017-02-02 04:24 · Score: 2
  
  Sounds like they restored from a backup. Backups are generally taken once per 24-hours, although PITR on databases is ... interesting, and complex as hell to pull off in the real world (I don't know why; it should be a simple operation, but no database seems to make it as easy as "look here for alternate binary logs and play forward until $TIME").
  Data loss of 6 hours of issues, MRs, comments, and the like is ... data loss of 6 hours. It's a lot in aggregate for something with over 70,000 users and 238,000 projects, but not much for one project unless project members spend all day writing hundreds of issues and comments instead of code.
  
  --
  Support my political activism on Patreon.
6. Re:Too Late by laughingskeptic · 2017-02-02 04:47 · Score: 1
  
  Its pretty easy in Microsoft land, for instance: https://www.sqlservercentral.c... and I have done similar things in bash to restore Oracle and Sybase DBs many years ago. Of course you have to have transaction logs to replay transaction logs and writing transactions logs is optional in MySQL and even if they are written, they are by default placed in the same data directory as the database.
7. Re:Too Late by houstonbofh · 2017-02-02 04:54 · Score: 2
  
  Restored from a manual backup and admin happened to take. All the automated systems failed... That said, nothing promotes fire safety like a good fire.
8. Re:Too Late by SScorpio · 2017-02-02 05:16 · Score: 1
  
  Exactly, I do this all the time in MS SQL to troubleshoot application bugs by creating a DEV copy to a specific point in time.
  
  RESTORE DATABASE SomeDatabase_20170202 FROM DISK = 'd:\Backups\SomeDatabase_20170202041500.BAK' WITH REPLACE, NORECOVERY, MOVE 'Data' TO 'D:\SQL\SomeDatabase_20170202.mdf', MOVE 'Log' TO 'L:\Log\SomeDatabase_20170202.ldf'; GO RESTORE LOG SomeDatabase_20170202 FROM DISK = 'd:\Backups\SomeDatabase_20170202050500.TRN' WITH NORECOVERY, STOPAT = '02/02/2017 4:48'; GO RESTORE DATABASE SomeDatabase_20170202 WITH RECOVERY; GO
  
  Now PiT recovery for some tables, while not overwriting other tables in the database. Or even across multiple database can make things get interesting really fast.
9. Re: Too Late by Anonymous Coward · 2017-02-02 05:44 · Score: 0
  
  Once a cheater always a cheater.
PFY ... by PPH · 2017-02-02 04:09 · Score: 3, Funny

... couldn't remember the exact database maintenance command sequence. So he called BOFH at home after hours for assistance.

--
Have gnu, will travel.
It's GIT for god sake by Anonymous Coward · 2017-02-02 04:14 · Score: 1

Of course it doesnt include users code - it's GIT for god sake. Developers have the whole repo on their own machine...
1. Re: It's GIT for god sake by Anonymous Coward · 2017-02-02 05:23 · Score: 0
  
  This didn't affect the git side of things, just the project add-ons, like issue tracking and wiki pages.
so.. by Anonymous Coward · 2017-02-02 04:18 · Score: 0

Staging Server = backup strategy.
nice, i'll have to remember that one.
We lost your data but we're back up and ready by JoeyRox · 2017-02-02 04:19 · Score: 2, Insightful

To lose more of your data.
1. Re:We lost your data but we're back up and ready by Anonymous Coward · 2017-02-02 04:20 · Score: 0
  
  I only fucked your sister once, honest.
2. Re:We lost your data but we're back up and ready by Anonymous Coward · 2017-02-02 06:00 · Score: 0
  
  I only fucked your sister once, honest.
  You, son of a bitch! You think you are going to get off this easy? From now on you'll have to fuck her all the time, day in, day out, night in, night out, day after day, week after week, month after month, year after year, constant non-stop fucking, if you don't, you are really screwed!
"wannabe GitHub alternative" ? by TheDarkener · 2017-02-02 04:25 · Score: 3, Insightful

"GitLab.com, the wannabe GitHub alternative" ... Uhm, is that really accurate?

--
It is pitch black. You are likely to be eaten by a grue.
1. Re:"wannabe GitHub alternative" ? by Anonymous Coward · 2017-02-02 04:32 · Score: 0
  
  It's pretty much entirely accurate.
  GitHub is a Git service that provides centralized Git repositories, along with additional services like an issue tracker and "merge request" handling. They make their money by selling their services to companies that either can't afford to or don't want to deal with setting up similar services in-house.
  GitLab does literally the exact same thing. It's pretty much feature-for-feature comparable with GitHub. If something GitHub does pisses you off, GitLab will piss you off in the same way. It has nothing beyond what GitHub does. It's a GitHub clone. That occasionally loses six hours of data.
2. Re:"wannabe GitHub alternative" ? by Anonymous Coward · 2017-02-02 04:49 · Score: 5, Informative
  
  A "github clone" which comes with a CE edition which is FLOSS, and an EE edition, for either zero-cost (CE edition), or just $ (EE edition). And in both cases, you can have your own on-premises. github would be $$$, and I don't think it does on-premises (but even if it does, it is a lot more expensive).
  It is also vastly preferred over github by anyone with small teams. It didn't get into fortune-500 by chance, nor did it get US$ 25M in funding by chance.
  But yes, if you hate github's usability or flows, there is no reason to believe you wouldn't hate gitlab as well. They are *not* the same, but they're close enough.
3. Re: "wannabe GitHub alternative" ? by Anonymous Coward · 2017-02-02 05:43 · Score: 0
  
  Sounds like an actual GitHub alternative then, not a "wannabe".
4. Re:"wannabe GitHub alternative" ? by rl117 · 2017-02-02 06:29 · Score: 4, Interesting
  
  GitLab does a bunch of stuff which GitHub doesn't. The most significant for me is the integrated CI, and that you can host your own runners and workspaces on your own infrastructure (or some cloud provider). Compared with Travis or some other CI hook on GitHub, this is vastly more flexible and powerful. I also find the ability to assign people for review, milestones and such on issues and merge requests to be very nice features which GitHub lacks. It is a GitHub clone, but they seem to have taken the lead in implementing more advanced functionality. At work, we're currently looking into a trial of GitLab plus our own multi-platform CI runners as an alternative to GitHub+Travis and internal Jenkins with several hundred jobs. It stands to greatly simplify the amount of failures, admin time and developer time keeping that lot going.
5. Re: "wannabe GitHub alternative" ? by Anonymous Coward · 2017-02-02 08:09 · Score: 0
  
  For the record GitHub also comes with with an enterprise edition as well though I'm not sure how it compares with GitLab. Does anyone have a feature comparison table?
6. Re: "wannabe GitHub alternative" ? by Anonymous Coward · 2017-02-02 08:43 · Score: 0
  
  No, it's still a wannabe in that no one uses it, everyone just uses GitHub instead. If offers nothing beyond GitHub so why the hell not just stick with the best, or at least the best supported? Just about every IDE out there has GitHub integration, none of them have GitLab integration. It's easier to just use GitHub.
7. Re:"wannabe GitHub alternative" ? by phantomfive · 2017-02-02 12:03 · Score: 1
  
  The most significant for me is the integrated CI, and that you can host your own runners and workspaces on your own infrastructure (or some cloud provider).
  Can you clarify this? What are runners, and what does it mean to host a workspace? Is that like an Eclipse workspace, or something else?
  
  --
  "First they came for the slanderers and i said nothing."
8. Re:"wannabe GitHub alternative" ? by rl117 · 2017-02-02 12:33 · Score: 1
  
  A runner is a job scheduler running on a remote host, which can be your own machine or hosted wherever you like. When you push a branch or open a merge request etc., you can have it trigger builds on any registered runners (you can have as many as you like). A workspace is a place to store stuff resulting from that build such as libraries, binaries, documentation etc. This means that you can have a CI workflow and deployment hooked directly into the merge request and code review process. This stuff also works at a higher level than github. With github, travis and other CI builds are tied to a project. With gitlab they can also operate at the level of an organisation, so you can use workspaces to test multiple projects in sequence, so that you can do CI and deployment stuff across multiple repositories to test all downstream dependencies or whatever you need to do. It's all documented on the gitlab site. I'm still in the early stages of trying all this stuff out myself.
9. Re:"wannabe GitHub alternative" ? by robmv · 2017-02-02 12:55 · Score: 1
  
  Runner are part of GitLab CI tools. They are daemons that you can host on your own infrastructure in order to run your automated builds or deployments. Details of GitLab CI
10. Re: "wannabe GitHub alternative" ? by Anonymous Coward · 2017-02-02 14:41 · Score: 0
  
  GitLab offers private repos for free. GitHub does not.
  But, GitHub has a cooler mascot/logo.
11. Re:"wannabe GitHub alternative" ? by Anonymous Coward · 2017-02-02 15:25 · Score: 0
  
  I use Gitlab CE at work. I've set up and maintained our servers associated with it for about a year now. It's been fantastic, we've migrated about 60 projects from a subversion server to Gitlab, and get a fast, stable interface for managing much of the projects without going outside our network environment. I don't have to go to sleep at night wondering if someone associated with a project is going to accidentally fork it into a public repo. I mean, it's git, they could obviously still do it, but that certainly wouldn't be accidental.
  We already added a line item for EE for next year in our budget. We are watching this event pretty closely but don't see anything that is causing concern. We're also a heavy pgsql shop, and are quite comfortable with the database management for the omnibus project. Not so heavy on the ruby, but hey, can't have everything.
12. Re: "wannabe GitHub alternative" ? by Anonymous Coward · 2017-02-02 21:34 · Score: 0
  
  No, it's still a wannabe in that no one uses it, everyone just uses GitHub instead.
  Oh, so they had no data to lose, and therefore this story never happened. Gotcha.
Not sure if this is reassuring... by nitehawk214 · 2017-02-02 04:31 · Score: 1

So they have found the data randomly on a server somewhere.

--
I'm a good cook. I'm a fantastic eater. - Steven Brust
1. Re:Not sure if this is reassuring... by Anonymous Coward · 2017-02-02 04:36 · Score: 0
  
  They probably didn't have a clear description of their backup strategy, just randomly backing up stuff wherever they can.
Bad incident; great response by Wuhao · 2017-02-02 04:37 · Score: 5, Interesting

Obviously, data loss is embarrassing. I think we all appreciate the importance of not only having multiple backups, but testing to ensure that your backups work, and are sufficient to fully restore operations. GitLab is just the latest in a long tradition of sites and services that have found themselves facing the consequences of not regularly testing their recovery plans.
But I do respect their response. They quickly recognized what had happened, and they diagnosed what went wrong with their backups. They did not try to use PR-speak to conceal their mistake -- they publicly copped to it, in plain industry-standard language that their users would understand, and even offered a livestream of their team resolving the issue. I think this has been a masterclass in how to recover from a blunder. I bet you that this is not a mistake GitLab will be repeating anytime soon.
Also, I think it's very fortunate that they're in the git repo business, and presumably users who had data that was affected by the loss still have a copy in their own local repos. Thank god for distributed SCM.
1. Re:Bad incident; great response by RPI+Geek · 2017-02-02 05:23 · Score: 1
  
  Thank god for distributed SCM.
  Considering that the particular SCM software in this story is Git, you should probably be thanking Linus Torvalds.
  
  On second thought, he might enjoy being called god. Carry on.
  
  --
  
  - "Nobody came out that night, not one was ever seen. But Old Man Stauf is waiting there, crazy sick and mean!"
2. Re:Bad incident; great response by jittles · 2017-02-02 05:57 · Score: 1
  
  Obviously, data loss is embarrassing. I think we all appreciate the importance of not only having multiple backups, but testing to ensure that your backups work, and are sufficient to fully restore operations. GitLab is just the latest in a long tradition of sites and services that have found themselves facing the consequences of not regularly testing their recovery plans.
  But I do respect their response. They quickly recognized what had happened, and they diagnosed what went wrong with their backups. They did not try to use PR-speak to conceal their mistake -- they publicly copped to it, in plain industry-standard language that their users would understand, and even offered a livestream of their team resolving the issue. I think this has been a masterclass in how to recover from a blunder. I bet you that this is not a mistake GitLab will be repeating anytime soon.
  Also, I think it's very fortunate that they're in the git repo business, and presumably users who had data that was affected by the loss still have a copy in their own local repos. Thank god for distributed SCM.
  They claimed they did not lose any Git data, only database records pertaining to users, issue tracking, tasks, etc. I don't know of anyone who backs up their bug tracking and other databases, so some people probably would have preferred to have lost their git data. It's easier to restore on an active project.
3. Re:Bad incident; great response by Anonymous Coward · 2017-02-02 06:56 · Score: 0
  
  Are you kidding? Of course we backup the issue databases and trackers at work. We'd have to be extremely insane not to.
  I know the same goes for kernel.org, debian.org, canonical's launchpad (which has Ubuntu and a lot more projects inside), etc. They are all backed up, although I couldn't tell you how often. In lauchpad's and debian's case, it actually "logs" to mailinglists (which are archived all around the 'net), so you could "play back" the log (reconstruct the bug reports).
  It would be safe to assume that anyone with a large issue tracker does backups. What is *not* safe to assume is that they followed the Tao of Backups, and would not suffer data loss on restore.
  And for a gitlab instance (such as gitlab.com), if you do any project metadata backup _at all_, you *are* backing up the issue tracker. It is contained in the project metadata dump...
  So, do you backup your project's metadata? Or are you "trusting the cloud" to do it for you?
4. Re:Bad incident; great response by Anonymous Coward · 2017-02-02 07:10 · Score: 0
  
  They claimed they did not lose any Git data, only database records pertaining to users, issue tracking, tasks, etc. I don't know of anyone who backs up their bug tracking and other databases, so some people probably would have preferred to have lost their git data. It's easier to restore on an active project.
  I don't setup services we don't need.
  Why wouldn't you back those services up? Anytime I set something up I setup an automatic backup for it at the same time. If no one else has access to it aka I can just spin up a new instance based on my text file logs of what I did the first time) then I just do a mysql dump to a tgz and email that to myself at an alias. That way backups are integrated with a daily report in my inbox and I can just delete them when I'm confidant I don't need a particular date for restores (AKA the end of month delete 29 of 30 of last months backups).
PHEWF, CRISIS AVERTED BOYS! by Anonymous Coward · 2017-02-02 04:40 · Score: 0

No worries mates,
A part-time intern dev found a week old backup on an SSD he stole and lost in the sofa.
Toodles til next time,
GitLab
Data types? by Anonymous Coward · 2017-02-02 04:49 · Score: 0

> developers may not be entirely happy with those data types being considered peripheral [...]
I use untyped lambda calculus, you insensitive clod!
[Meta: am I the only who cringes when a technical term becomes buzzspeak? Am I the only one to feel like punching some idiot?]
HAH I called it. by funkymonkjay · 2017-02-02 04:50 · Score: 1

Nice thing about having all these release stages is that they are tested before promoting to the next stage.
Process for updating any stage:
1. copy data from next stage.
2. deploy new code
3. TEST TEST TEST
I Was Impressed by segedunum · 2017-02-02 05:11 · Score: 1

Yes, there are questions about how this happened, how an admin was seemingly under a bit of pressure that that happened, the question about non-existent backups and whether they have people with enough Postgres skills, but I was impressed about the way they admitted it. They didn't butt cover, they admitted upfront and point-blank "Yer, we've deleted the production Postgres data directory, our backups don't work, we're seeing what we can salvage elsewhere."

Yes, if you have copies of your production data in staging as a last resort when all else is not good, they can be used as backups. I would imagine they wouldn't want that stomach dropping feeling again..................
How? by drew_92123 · 2017-02-02 05:19 · Score: 1

How do people still lose data in a time when so many options available to limit or even prevent it... synchronous or asynchronous replication to off site storage, snapshots, raid 6... we have the technology available to make data loss nearly unheard of... it's relatively easy to plan and implement, and it works... and yet morons everywhere STILL manage to lose data...
1. Re:How? by helsinki92 · 2017-02-02 05:47 · Score: 1
  
  You shouldn't be surprised to see where Mr. Murphy plants his foot occasionally.
2. Re:How? by Anonymous Coward · 2017-02-02 07:02 · Score: 0
  
  It is easy. You just listed a form of RAID, and also two forms of online replication. Both are usually quite useless at avoiding data loss when not caused by a hardware fault. You delete the data, the replicated copies get flushed to hell quite soon enough...
  Backups (aka "retention replication") are not trivial. They have never been. We would never need to both perform raw/snapshot+raw and logical backups if these things were not prone to blowing up in flames as easily as NOX.
3. Re:How? by Actually,+I+do+RTFA · 2017-02-02 08:13 · Score: 1
  
  The problem is those are hardware fault solutions, not general solutions. Going bit by bit:
  "RAID is not a backup" is a mantra. Given that the data was deleted, not merely a physical disk failure, RAID did not mitigate it. It was successfully deleted across all disks. (unless one failed, but who cares?)
  They had replication to off-site storage. However, the deletion also replicated to the off-site location.
  They did have backups it seems, because they were able to roll back 6 hours to a restorable point in time.
  
  --
  Your ad here. Ask me how!
This could be a thing by backslashdot · 2017-02-02 05:22 · Score: 1

Every X months or years someone can find some of the missing gitlabs data on a server somewhere. Just when you thought that was all they would recover, someone finds a few kilobytes of missing gitlabs data on an SD card floating in a sewer.
Why the axe to grind? by wbr1 · 2017-02-02 05:33 · Score: 5, Insightful

"wannabe"
"pr flacks"
number doubting '"less than 1% of our user base." But the firm's incident log says 707 users have lost data"
Why the negative tone? I am not a coder. I do not use GitLab or GitHub except for an occasional download. However, generally competition is good. Sure this company lost data.. so do many. The real questions are is this indicative of a systemic issue or just a one time occurrence. I just don't see why this level of negativity is being pushed against this company.

--
Silence is a state of mime.
1. Re:Why the axe to grind? by Anonymous Coward · 2017-02-02 07:27 · Score: 0
  
  Why the negative tone?
  Because some people are pricks.
2. Re:Why the axe to grind? by Anonymous Coward · 2017-02-02 08:10 · Score: 0
  
  GitLab doesn't kowtow to the Regressive Left.
3. Re:Why the axe to grind? by Anonymous Coward · 2017-02-05 08:32 · Score: 0
  
  I don't see all this negativity. In fact, quite the opposite. What *I* don't get is all this leniency and praise they're getting. Not backing up your clients's data is grossly irresponsible, and let's not forget they were offline for over 12 hours. They *should* be getting some flack for that, instead of all the praise and support.
  These are not computer geeks working out of their garage, this is a company with over $25M in funding that provides service to Fortune 500 companies. What happened was beyond recklessness, it was gross negligence, and is an indication of how they run the rest of their business, and of the disregard they have for customer data and high availability.
Whatever, it's just a backup by Anonymous Coward · 2017-02-02 05:47 · Score: 0

Shouldn't everyone that pushes to Git still have their local? In the case of group projects, a ton of locals.
if your backup is utilitarian, its not a big deal by Frankenshteen · 2017-02-02 05:57 · Score: 1

reduce the urgency in a disaster by making the manner in which you would recover part of your daily routine - to whatever extant possible.

--
"It's a doughnut stuffed with M&M's. That way when you finish the doughnut, you don't have to eat any M&M's."
Found Lost Data by adam.voss · 2017-02-02 06:04 · Score: 1

Am I the only one that read the title thinking the data was recovered?
That's my experience as a backup provider by raymorris · 2017-02-02 06:13 · Score: 2

That matches my experience. My company offers an offsite, bootable backup solution so if anything bad happens to your server, you just boot the appropriate clone in our cloud and you're back in business. A LOT of our customers get our service when they find out the hard way why *proper* offsite backups are important. Many weren't too concerned about backup and business continuity until something bad happened to them.
AFTER they have a major loss they get serious about making sure it won't happen again.
1. Re:That's my experience as a backup provider by Anonymous Coward · 2017-02-02 06:26 · Score: 0
  
  AFTER they have a major loss they get serious about making sure it won't happen again.
  For a short while... Then it gets hit with management and other stuff and you're back at it again in 3-5 years.
I guess the good news... by John+Allsup · 2017-02-02 06:40 · Score: 1

It could have been far worse, and I imagine GitLab will make damned sure backups and suchlike work properly in future.

--
John_Chalisque
No Big Deal by Anonymous Coward · 2017-02-02 07:14 · Score: 0

Git repositories are just a public facing version of the local version you should already have a backup for. Even if they lost everything so F'ing what. Just re-sync your repo to your local and it's all back online again. If each developer isn't making local backups with redundent HDD or raid then that's completely your fault. Git is just a repository, anything lost worth a damn you can always reupload to their server again with no problem. If you can't do it this way and rely solely on Git to be the one and only storage of your code then you're an idiot... idiots and programmers generally are not synonymous so I really don't see any issue here at all.
Get off my lawn by Drunkulus · 2017-02-02 07:48 · Score: 1

A company with 25 million VC bucks and customers like IBM, Redhat, and NASA doesn't have a working backup system? Let me guess, everybody at Gitlab is a developer, and the whole thing runs on node.js in Docker containers.
1. Re:Get off my lawn by Anonymous Coward · 2017-02-02 13:08 · Score: 0
  
  All the big name customers runs their own instance and were not affected by the mistake.
nearly worthless data by Anonymous Coward · 2017-02-02 09:02 · Score: 0

Emphemeral data tends to have little value long term, so lost "issues, merge requests, users, comments, snippets, etc" is not so big of an issue.
That any data at all is lost indicates there is a broken process where they are unable to protect even trivial bits of data. Finding a copy is just dumb luck, nobody plans to find copies of missing data.
Remember Y2K goofiness? by Darkness+Of+Course · 2017-02-02 09:16 · Score: 1

Well, I was involved verifying that we were in compliance. Over 100k products and some percentage were software, probably under 5%. A few projects were archived in the company archives. Funny coincidence I was there when the initial procedures were established. I didn't establish them, but I used them to archive a few software projects I was involved with. Bounce back to Y2K and I am requesting source code from several projects, now defunct but quite possibly will existing users. Simple, we will read the source, establish if there was or was not any time related silliness and go back to the main projects and 10's of thousands remaining. Of course you have already guessed it. No files were available. All the storage media provided was blank. I suspects that the dd-equivilent was done backwards. Take the blank storage and dd it to the incoming data. Fun. I doubt if it has ever been corrected as my comments were ignored back then. BTW, I am blissfully retired.
$25 Million and no backups? by Anonymous Coward · 2017-02-02 12:28 · Score: 1

What kind of IT organisation has $25 Million at their disposal, has a core business of looking after developers data and yet doesn't have snapshots and backups on that data? Seriously? In this day and age? Most modern filesystems have snapshot abilities and the ability to export those snapshots, wouldn't you then do an rsync or tape backup as a belt and braces thing? Also, check your backups, have backup monitoring in place, copy the data somewhere else as a DR plan, undertake test restores, copy important data to two locations (i.e. metadata)?
1. Re: $25 Million and no backups? by Anonymous Coward · 2017-02-02 14:39 · Score: 0
  
  Maybe someone is negotiating to buy them and the price just dropped a bit.