Red Hat Suffers Massive Data Center Network Outage
An anonymous reader writes: According to multiple reports on Twitter, the Fedora Infrastructure Status page, and the #fedora-admin Freenode IRC channel, Red Hat is suffering a massive network outage at their primary data center. Details are sketchy at this point, but it looks to be impacting the Red Hat Customer Portal; as well as all their repositories (including Fedora, EPEL, Copr); their public build system, Koji; and a whole host of other popular services. There is no ETA for restoration of services at this point.
'Nuff said.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
But seriously, we need to find out what happened. I hope it's a hardware issue and not software.
From Bill Gates arsehole
"sso.redhat.com has encountered an error, please try your request again."
$BadThing happened about $company||$person I like, therefore $conspiracy!!
They should have used Ubuntu.
(Okay, maybe only if systemd ran as PID 2 ...)
It must have been something you assimilated. . . .
No SystemD
Seriously, when you don't care about quality, you have more problems. When you have more problems and you swallow log messages, you make it take a lot longer to fix those problems.
if they were running distributed... oh, wait.
guess it was a Beowulf cluster of Clusters.
if this is supposed to be a new economy, how come they still want my old fashioned money?
Uh oh... sounds like the datacenter team accidentally deployed the systemd package we cooked up for the North Korean missile program.
Yes, I agree, it is BIG NEWS when a Linux distro has any kind of problem because it almost never happens. MS outages, not so much.
San Francisco has a massive network outage - I must've missed your post about that.
Come back after you grow up @msmash.
It's a common interface so there's a way to apply cgroups.
they should have been running Oracle Unbreakable LInux and none of this would have happened!
*snicker*
But do these "separate application[s]" break if pid 1 is something other than systemd?
Depends on the applications.
Boot-loader ? NTP clients ? these aren't deeply interdependent.
You could very much use these and then run openrc if you want.
These are just handled by systemd in the sens that these are program who are now developed by people who are on the systemd *project* team.
At most systemd might leverage boot-loader in the sense that it can more easily send parameters to it for the next boot.
Though other daemon might be much more interlinked with systemd's (the daemon running at PID 1) job of starting/stopping things.
(I'm thinking about daemon managing seats, sessions, and starting/stopping hardware).
These are important as over time, the Linux kernel is starting to go way much more advanced than standard POSIX behaviour.
Linux kernel, for example, offers cgroup isolation, namespaces, etc. (the facilities which are leveraged by containers such as LXC, Docker, Andbox...) which are definitely useful (separating session in different name-spaces, jailing some daemons in containers, automatically configuring the content of a container state-lessly)
The older bash-script-based mess of code that used to predate systemd has absolutely zero ability to leverage them.
SystemD is one rather successful way to deal with those things that wasn't provided by SysVinit.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Wrong RAID (Redundant Array of Inexpensive Datacenters) level, perhaps?
Seems that some big cities are dark and it took some datacenters too... https://www.inverse.com/article/30631-lax-sf-ny-power-outages
Its all the Ubuntu users switching over to Fedora after the latest Canonical debacle.
At about 9:30 PM California time, the Fedora Infrastructure web page came up with every server showing Everything seems to be working.
About as strange as the news item itself is the absence from Slashdot postings from a system manager who can make a plausible guess as to what specifically would have set off a "massive outage"?
About 18 years ago, I failed the Red Hat Certified Network Engineer one week class and exam. I felt the exam was heavily loaded with "stupid dorm tricks" which are stupid things you can do to a Red Hat or Linux system to make it do strange things at boot time.
When this Red Hat event is finally reported, I will be looking to see what is the state of the art in tricks to screw up an enterprise cluster.
Looking forward to that reveal. Was it a hack/os abuse or Infra? 14 Hours now out? WHat is going on? Press is quiet too...
Time for a new Political party in the US (or two!) One is off the rails Other cant pony up a leader.
From status.redhat.com: Update - Network issues are stabilized and we are working bring applications back online. Apr 21, 23:55 EDT Update - Connectivity between the Internet and the main production data center has been restored, however the network is not fully stable yet. Network team is fully engaged with the vendor and working hard to restore services. Apr 21, 19:25 EDT Identified - We have identified the issue and are working to resolve the problem. Apr 21, 13:52 EDT Investigating - Widespread system issues are causing parts of the Customer Portal to become non-responsive. We are working to resolve and will update this incident with relevant details. Apr 21, 11:13 EDT
Both cgroups and containers can be created and manipulated from the command line. Nobody bothered to do this as a PoC before going off on a tear and creating systemd.
There were command-line demos of cgroup/namespace (video of devs launching "make -j 255" while the desktop remains responsive).
What nobody bothered, is trying to rewrite the mass amount of bash script to support it.
(Specially since nearly every distro seems to write their own script madness to handle starting/stopping jobs *).
You would either need each single distro to rewrite mass amount of in-house developed shell code in order to leverage such newer kernel functionnality.
Or, you need that a few standard component that leverage the facility for you an simplify using it.
For obvious resource-saving reasons, most people went for the 2nd option.
Systemd was one of such developed facilities, and is the one who ended up the most popular. (By redhat who developed it. But also by other distributions who picked it up very early like suse).
And as usual, Canonical went for their own "NIH-syndrom" solution (upstart), before eventually joining everyone else.
---
* : which is another advantage of systemd.
Most distribution wrote their own rc?.d scripts, usually with tons of boiler-plate code (for starting, checking status, etc.)
systemd relies on much smaller simpler static configuration files - easier to edit, also easier to share.
It's easier to maintain the the content of rc?d.
On the other hand, unlike SysVInit, where much of their processing is done by calling bash (i.e.: outside of PID 1), systemd move a little bit more functionality in there (into PID 1) in order to be able to interpret the conf files.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]