We're Experiencing Technical Difficulties (Again)
Proof once again that I shouldn't be allowed anywhere near
a root password, Slashdot's httpd has begun crashing.
It dies about every 4 minutes for no apparent reason.
Nothing shows up in any of the logs. I haven't changed
a single thing on this damn machine since last wed, and
this started yesterday, so its either gremlins or script
kiddies. Anyhoo, please hang in there- I'm working as
fast as I can. I'm going to be shuffling around some
hardware soon (including a much faster box for Slashdot)
so hopefully that will help. This puts a delay on the
new moderation system (grr) but I'll get to it. I'll
be a bit balder but I'll get to it. I gotta hire
a sysadmin. Ugh. Update: 03/08 01:15 by CT : Please stop sending resumes!
Don't worry about it Rob...if people start flaming you, just give me the root password, then I'll REALLY mess the system up, and people will finally start to appreciate everything you do.
:))
-Dave
(UM_Maverick who forgot his password...so maybe you don't want to give me root after all
What,
I thought linux and apache were so perfect that they never suffered a problem like this.
Hey, at least you have the source code. It should be easy to chase it down and implement a fix.
Thank God (err Linus) for Open Source Software!
it was happening to my compagny almost once a week.
Since we reboot the server each week now, it has never happened again.
Amature Night .. Mickey Mouse comes home ..
Good things:
You didn't change the live server.
You are apparently producing (and looking at) logs.
You are letting us know what's going on.
Bad thing:
You are intending to change the hardware conf during a problem. This is not NT! Unless you have a specific reason to mistrust the hardware (getting Sig11s for instance) don't change anything until the box is stable (but slow).
Now I'm not an NT advocate, but I gotta say this... Our NT proxy server (NT4/SP4/IIS4/PS2) processes over 900,000 requests, passes 5GB of data, and handles a 9GB cache and I've only had to reboot it once because we had a bad DIMM. Our web server is another matter though, but I didn't set that one up. =)
...is MCSE certified. I hear they're really good at rebooting things. Also pointing and clicking.
Figures.. Damn users and programmers always complaining until _they_ are responsible for uptime, resources, security, etc... Think about Rob's problem before you ask your admin for root and think, do you always want to be a suspect the next time the system crashes?
;)
My policy is, if you want root, you carry a pager 24/7 and you are on call.. and if I catch you putting a change in without logging it in RCS I'll put my Timberland so far up your ass it'll chip your teeth..
Good luck finding an admin, Rob, I dunno if you could afford a decent one with just ad revenue..
I bet RedHat put this article automatically on their crappy portal site during the next update. That'll make their portal look really professional creating a great corporate image for RedHat (and as many suits think RedHat = Linux then that's a REALLY bad thing).
upgrade to the newest Apache (1.3.4)
.89? drivers, but check out the newest.
Also, I bet you're running a tulip-based adapter. Check out the 0.90Q driver version. I'm running about 150kbps sustained/400kbps burst on a tulip attached to a shared fractional T3.. I was getting network lock-ups (all routing tables fine, all addrs fine, just wouldn't put anything on the wire) something fierce. Linux 2.2.2 comes with the
My new drivers rebooted over the weekend (I ended up writing a pingboot script which would reboot the box after pings to its router dropped below X) and have been up 3 days almost.
Best of luck
- Otis
If Slashdot were running NT, there wouldn't be 200 people reading to complain about anything.
Then you don't get the awesome 12 month uptimes... oh well
What on earth would posess you to advocate a soft drink in your .sig? Is this the web version of wearing a Nike t-shirt?
Note that it's Apache that's crashing, not Linux
Some women like bald, so you might get lucky. That is if you like fat hairy women.
In my opinion, that is all because of Wednesday's Gnome subject, the one with more than 1k comments... /. behaviour above this limit....
/. is like Paris.
Remember the last posts, they were all wondering about
We have the answer...
This is Gnome fault, Gnome is evil !
...
...
...
...
(The above sentence was only a joke, don't start a new flame war, please)
By the way, is the video card ATI (3D) Rage Pro supported by linux (with a free driver, I mean) ?
Has someone experienced it ?
Sincerely,
Sometimes I think
Paris is so nice without the Parisians...;P
Happy to help for free.
webmaster@avenir.dhs.org
What irony -- the site that constantly harps on the supposed instability of NT and other MS operating systems has trouble staying up....doh.
I use Linux. I like Linux. But I specifically don't go around bashing other OS's because I know what happens when the shoe is on the other foot.
Can Slashdot's motto change from "News for Nerds who Only Love Linux and Hate MS"? Is it possible?
www.slashdot.org is running Apache/1.3.4 (Unix) mod_perl/1.18 on Linux
Perhaps there's a link back to slashdot on the site and people are clicking through again and again and again and again...
I also experienced problems this morning
and early in the afternoon (I'm in Holland, too
but in Europe, CET). But those symptoms showed
no httpd crash but routing problems on
the way to slashdot. Or maybe even the machine
was down. Any sign of that in the logs?
Sh*t. The problem persists even now:
traceroute to slashdot.org (206.170.14.75), 30 hops max, 40 byte packets
1 twi-julianalaan (130.161.156.1) 8.183 ms 1.483 ms 1.091 ms
2 AR1.Delft.surf.net (145.41.18.1) 2.639 ms 0.951 ms 0.948 ms
3 BR1.Delft.surf.net (145.41.7.153) 1.000 ms 1.202 ms 0.911 ms
4 BR7.Amsterdam.surf.net (145.41.7.214) 2.835 ms 2.910 ms 2.856 ms
5 BR2.NewYork.surf.net (145.41.7.110) 86.313 ms 85.229 ms 85.205 ms
6 gin-nyy-core1.Teleglobe.net (207.45.196.129) 85.552 ms 86.187 ms 85.255 ms
7 gin-nyy-bb4.Teleglobe.net (207.45.222.6) 87.831 ms 85.273 ms 88.085 ms
8 gin-spn-bb1.Teleglobe.net (207.45.223.6) 93.630 ms 90.083 ms 93.426 ms
9 sprint-nap.ibm.net (192.157.69.20) 90.061 ms 90.805 ms 90.351 ms
10 nyor1sr1-5-0.ny.us.ibm.net (198.133.27.6) 94.083 ms 93.971 ms 93.189 ms
11 nyor1br2-11-1-0.ny.us.ibm.net (165.87.28.162) 95.334 ms 94.420 ms 94.460 ms
12 sfra1br2-8-0-2.ca.us.ibm.net (165.87.230.230) 196.107 ms 195.506 ms 199.880 ms
13 * * *
14 * * *
15 * * *
I wonder how shall I submit this ...
What do you mean by this? /. is probably one of the hardest hit sites on the net, and it rarely has problems.
The 2D side is supported fine in XFree, but ATI has made it clear they have no interest in Linux, and no interest in supporting the 3D side.
I was using a ATI Xpert@play, but have a Diamond Moster Fusion now. 3dfx is very supportive of Linux. I have to admit I find their new distribution plans kind of whacked, but they are nice to Linux.
It will only cost you $$$, but at least you'll be able to blame MS and reboot rather than thinking through problems yourself.
Rob, you could be watching the DOJ trial on teevee, but no, you're spending the morning hacking through linux and learning about computers instead.
Just give us the money, we'll make all your brain cells and linux/apache problems go away. Why work so hard when you can just click your way to oblivion and blame MS when things go wrong?
Join the Pentagon, they know a smart ship when they see one! Invest in MS, have a scapegoat for a price!
Ed "Bottom Boy" Muth
John Marquist Jr.
P.O. Box 17
Somewhere in Ohio
Career Objective: will work for food.
SKILLS: Virtual Basic, Mouse, Keyboard, Flappy Drive.
EMPLOYEMENT
1/98-9/98: Hairy Gary's House 'O' Computers. Assembled computers. Wrote excel macros. Hit on the secretary a lot. REASON FOR LEAVING: That bitch wouldn't put out for me, I had to get violent.
3/96-12/96: Meat Processing Plant. Sliced meat. Carried heavy slabs of meat. REASON FOR LEAVING: They caught me beating the meat.
EDUCATION
Almost graduated from high school.
REFERENCES
Available upon request.
If your httpd dies every 4 minutes and you think that it's not a problem of buggy Web server or your hardware - this means that someone from outside knows how to exploit some bug in your software. Run "tcpdump" and look for every SYN TCP packets coming to port 80. If you will see that someone (from some address or subnet) connects to you and your server dies after this request - you have a temporary solution. Block the access from this subnet. But before try to capture entry request which is being sent you your server, so you can examine it later or send to developers of your HTTPD.
Nikolay Grigoriev (nick@aanet.ru)
Redhat is a dawg. Go for Slackware, SuSE, Debian...you know, any of the good ones :).
Well My linux proxy server never goes down either, but then again a proxy server is a different matter, it processes requests, yes is does.
BUT
It only serves pages IF it has them, if it does not have them its basically a pass through.
DUH!
That's not very ironic considering we have no idea what you like. Go figure.
You got hit with a kernel bug or some random event that messed up some kernel structures. The best bet would be reboot and see how that goes. I had one the other day where apache was returning errors all over the place and the HTTP headers were getting corrupted going to the cgi scripts. Restarting apache made no difference but it is working fine since the reboot. Maybe there is a new DoS attack that will tweak something in the kernel and cause apache to have problems.
I offered to send lava (wwwlavalink.com) linux kernel 2.2.x patches for their pci parallel cards. The never have responded back (and I tried on two different occasions (asking for a yes, no or whatever). Seems they have no interest in having linux autodetect their parallel pci card. Too bad, it works well...I can't figure it, I even offered them for free, as I should.
I offered to send lava (www.lavalink.com) linux kernel 2.2.x patches for their pci parallel cards. The never have responded back (and I tried on two different occasions (asking for a yes, no or whatever). Seems they have no interest in having linux autodetect their parallel pci card. Too bad, it works well...I can't figure it, I even offered them for free, as I should.
Its obviously inexperience in handling a load this large. Linux is more than capable of handing a half million hits a day or more, even on a less powerful box.
They might want to look at the perl code too. Perl at hit rates such as this can be inadequate...have they tried compiled code instead of interpreted (I doubt it). Are they running an inferior release like redhat (probably).
Amazingly enough, it's the cached version of slashdot :) If you look at the links that you get on every page of slashdot at the top and bottom of the page (in Lynx, god knows where they show up under Netscape/IE), you'll find a cachedot link between "past polls" and "features".
You probably mean that you simply drop TCP packets with SYN flag set and don't send ICMP packets back. Make sure that you don't do this for port 113 (auth) or your users will hate you.
-- me (speaking from experience)
If you were testing around 2am-5am PST, ibm.net was doing some major upgrades on their core OpenNet routers to increase capacity. That may have caused intermittant connectivity problems along that route during that time.
That's what I keep telling Rob.. I could install it for him (I have physical access). But all he does it go "haha"
Apparently one of the below is true
:)
A. The right tech got the email
B. They read slashdot
Cachedot seems to run pretty good... [flame]most likely because it is running Solaris.[/flame]
You're right, DUH, but my post was more to the point that NT can handle a lot of transactions per minute/hour/day/whatever. Think about what a (basic) web server does. It accepts a request, and sends the file which the user wanted. It's just like file sharing with a different protocol... sort of... kind of... bah... forget it.
I think its very honest and comendable that
:)
this was posted in such honesty.
I have nothing bad at all to say about you
guys who run, this site you do a great job.
I do want to make a point for the reads
of the site, that this just shows that
any software/OS will have its problems
and be down and out.
This does not just happen with NT, but lots
with Linux & Apache too.
In fact most of the "big" linux sites (that
is sites that carry news about linux) have
been down and had trouble at one time or
another. This is normal.
So instead of laughing evily each time a
site that runs NT is having problems, have
some compassion for the ppl who work there,
and realise the fact that Linux does not
give you 100% uptime.
microsoft.com has 99.5% uptime, very
hard to beat. and gosh it runs on NT.
Well well, perhpaps all you hypocrites will
think again the next time you take great
joy in a WinNT machine going down.
Not as much fun when its your favourite site
is it?
Not as much fun when its your favourite OS is
it?
ha?
heheeh.
All major sites have problems, and most of them
have good honest ppl working behind them.
Its mostly about working hard, but whatever
you run. (except possibly those new AS/400
pooter, they look sweet), you will have
problems.
Many of us are vigilant in checking our systems for signs of real or attempted break-ins. And of course every now and then you may find evidence of an attack in your logfiles. But isn't it frustrating that the people responsible for attacking our systems are almost never caught or even identified? Is there anything more we could do to help deter attackers and make the process of identifying these people more certain and reliable?
One idea would be to post in a public forum the IP addresses of the attackers, at least in cases where the information is available from the logfiles. If enough people were to do this, it would be possible to trace through the chain of compromised hosts used by the attacker(s) to hide their origins right back to the source host(s).
There are a number of problems with this approach, not least finding websites willing to host the tracing service. Firstly, not all computing environments would favour the idea of admitting publicly that a system had been compromised. In such cases, the only publishable information might be the attackers' IP addresses and approximated dates of attacks. This would still be useful. Secondly, the logfiles might have been deleted by the attacker(s), although in practice it seems this rarely happens and would be impossible in cases where backup records are kept securely and non-electronically e.g. on paper.
Does anyone have any examples of successfully using a tracing approach to find attackers?
There was a post on Slashdot a while back listing IP addresses from an attack but there doesn't seem to be widespread use of IP address publication as a tool for deterring and identifying attackers. Would/should a security site like CERT expand its operations to provide a public tracing service?
Signed: 1d6a81fe0a6f4aaa97cdaf0622655e9d
You must be a soulmate or something. Now, that
makes us two. Could there be more ppl like us out there?
pooter==device for sucking up insects, usually consists of a tube you suck on inserted into a covered jar (this tube has some gauze on the end in the jar to stop you ingesting the insect), with a second tube running back out of the jar that you catch the insect with, by sucking hard on the first tube so that the insect is drwan up the second tube and into the jar...
No joke. There are some really silly words in existance, but fortunately most of them are linked to really silly things....
If you want a stable machine, NT or Linux aren't
:-)
the OS'es you should use. Linux might be a fun
toy for rookies to tinker with, but those of us
who like things that just work, Linux just isn't
an option. We choose FreeBSD, NetBSD, or some
of the commercial Unices.
When people claim that Linux is stable, they
must be from the NT camp. It is like the noble
women in Europe some centuries ago that carried
around monkeys on their shoulders. These women
looked more beutiful next to a monkey.
ROFLMAO
Well, that goes to anyone who doubted that Slashdot has no lack of men of action (Sigismundo Malatesta would be proud), as if the Slashdot Effect wasn't enough proof.
Here's the perfect gaffe for recruiters: impersonate Rob Malda and co on a bad admin day, and watch your resume database grow.
--Uche
Too far from his password to de-anonymize
http://www.rasterman.com/raster/
...has the lurid details!
No I dont see how smart you are.
I see how ignorant you are.
But then from linux fanatics
who sees Linus as a god it is to be
expected.
Silence, troll, or I'll put the fuckin leashes on you.
Finally someone with a plan that makes sense!!!
I was wondering how long it would take for this to come up. Of course it costs more from your ISP. But I'm sure Rob can get someone to pony that up for free.
ah, yes.... the mysteries of life. Noone ever knows why Windows works better after a reboot than after 2 hours of uptime. Call it the Force, the Meaning Of Life, Duct Tape, but you can't explain it in words. Mayhaps Linux too got some of that old juice in its kernel. Perhaps it is better to just... reboot.
God (err Linus)
uhh. what about Richard Stallman? The creator
of the Free Software Foundation.. Who made
more of GNU/Linus than linus did.
What tools allow you to load-balance web servers over a farm?
Just switch to NT and all your problems will go away.
;-)
Not as much fun when its your favourite OS is
it?
i didnt see a thing about linux actually crashing, just httpd.
I hate that windoze is so unstable, but I love Visual Studio too. I find it much easier to do my CS assignments w/o having to edit makefiles, just "Add to project":"rebuild all":"more than 101 errors, aborting compilation"
Damned right you would be hearing complaints about NT. There's plenty to complain about.
Having said that, this bsandlin guy's right. You Linux guys do tend to look at the world through rose-tinted spectacles. You're in severe danger of becoming evanglists to the Linux cause (ala Mac, Amiga people). By all means use Linux (I do), but learn how to criticise it as well. It ain't all it's cracked up to be.
Yupp that is true.
But microsoft.com is also immensly bigger
than Slashdot.
Nothign wrong with Slashdot its just
very far from the size of microsoft.com
(or Oracle.com, IBm.com etc)
Wrong on both counts.
Rob "CmdrTaco" Malda
Pants are Optional
Pants are still optional, but recommended for you.
wasnt that near the 500k hit/day?
---
Linus was just a small tot when OSS was concieved.
.
Look on the bright side.. At least you're not administrator of an NT box.
I noticed that cachedot accesses http://www.slashdot.org/ and not http://slashdot.org/
cachedot is evil - *eeeeevviiiiiiilllll*
or maybe not
...j
(I hope this was setup by a non-BSI chap/chapette)
My (not at all interesting) point is that normally people 'round here get hyper about publishing the URL as http://www.slashdot.org/ and not http://slashdot.org/
Interesting, huh?
...j
Posted by neuralfraud:
:)
Doesnt slashdot run redhat?
I knew that dist was funky, I using stampede have never experienced a httpd failure, but then again 250 hits/day doesnt compare to 25000
Rob, is it individual httpd's crashing, or is the main root-owned process disappearing on you? Regardless, try attaching gdb to one of your processes and see what happens when it goes down.
You may be able to catch a seg fault or a bus error and then get a backtrace to get some idea of where things are going wrong.
Exactly where do I send my resume?
Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
Update: 03/08 01:15 by CT: Please stop sending resumes!
Hmmm... Why not start (yet another) resume and job opening listing service?
"Slashjobs, Jobs for Nerds, Pay that Matters"
Nevermind.
The key to Microsoft.com's stability is clustering. BIGTIME clustering. I think with a cluster a 99.5% uptime isn't that great - it means that at some point in time, for a few hours in the year, all the servers in the cluster are down. Not good.
Matt. Want XML + Apache + Stylesheets? Get AxKit.
Yeah, run Slashdot on your NT box. Go for it.
Hopefully you don't want it in .doc format...
(that's for the sysadmin job, btw)
It's my statement on rampant consumerism. The irony
is that I don't even really LIEK pepsi. Go figure.
I think the most long term solution at this point
is to start thinking server farm as opposed to a
single server and a 'cache' server.
*shrug*
The problem with httpd crashing is nifty. You'll have to get a stacktrace or other information in order to fix it though unless it gets fixed through blind luck.
Brian
Doubt this will get read, but there are things like a Cisco redirector which will redirect a single ip to multiple private server machines. Load balancing solutions exist which can tie into an OS's system load to more intelligently balance but I'm not sure if anything supports the Linux kernel.
Brian
micosoft.com uses a ton of NT servers to keep it running...when one crashes the rest take up the slack...also, I dont think microsoft.com runs the stock NT. They seem to have their own version. Ever Queso them?
slashdot.org is 1 server (well...there is cachedot, but i dont think that does much other then cache).
I have to return some videotapes...
What is Cachedot? I have never heard of it before.
This makes a lot of sence! A reciprocal Slashdot effect, costing Rob his hair...
You forgot to mis-spell "request"
I see even classic Slashdot is now pretty much unusable on dial up anymore.
Are you sure that those 200 people would have gotten through if he had been running NT?
I see even classic Slashdot is now pretty much unusable on dial up anymore.
No doubt the site is groaning under the burden of grateful slashdotters everywhere constantly flooding Slashdot.org with messages of appreciation for such a fine and free service to us all and of admiration for all those who make it possible. Yeah, that's gotta be it!
I see even classic Slashdot is now pretty much unusable on dial up anymore.
Rob, please tell us that you weren't actually wearing a Microsoft shirt. Tell us that it was digitally "airbrushed" in... tell us that aliens kidnapped you and put it on you, and the picture was taken just as you were coming to... tell us that isn't a disguised Monica Lewinsky in the other picture...
________________________
Corporate Jenga: You take a blockhead from the bottom and you put him on top...
Where can I send my resume? >:)
--
Why use Linux? Just switch to FreeBSD...or at the very least give it a try.
JB
I know! It's the ghosts of all those servers that have melted down over the months come back to wreak their revenge! It makes perfect sense.
Anyway, I still love ya'll. I can accept some rocky roads in this affair.
so shoudl this be changed?
and make sure you allow the damn thing to dump core, and know hwere the cores will go. as a last resort, strace -p to random httpd processes (not the master one, unless you have 10x the cpu /. currentyl takes!) see if you can catch where the fault happens.
please.. must... move... somewhere... else!
:)
any format you prefer for applications?
I only live like 3.5 hours away, (ever heard of Boyne City, MI)
Also that program really starts thrashing around when you are getting a lot of accesses. It would kill slashdot's performance.
Perhaps you should try out iplog. Although the logs for /. would probably be horrendously huge.
I ate something that disagreed with me. Maybe I should have cooked him first.
In case you hadn't noticed, something is already killing /.'s performance, every four minutes ;-)
I ate something that disagreed with me. Maybe I should have cooked him first.
Since I first started playing with linux in '95... of course my machine is definately NOT production and I've broken it many times, but I really like that feeling of power! :)
"... I declare our city to be a free and independent state to be named Tri-Insula!" --Fernando Wood, Mayor of NYC 1861
I think I recognize that small text :)
Just got it on my own screen, or, at least, a similar one...
--The knowledge that you are an idiot, is what distinguishes you from one.
If you were running NT, I would have to look at 200 people bitching about how bad NT is.
It's interesting to see all the accesses against services with known problems. I'm surprised how many times someone tries to use a socks proxy server on my firewall when there isn't one available. The other fun thing is the reactions of sysadmins to my telling them their system was compromized. Currently I log all SYN connection request packets, and all packets to some ports. All logging goes via klogd/syslogd so it can be remotely logged on a log host.
As for speed, it seams to be keeping up nicely with a DSL link to the outside, and transfers from my local net to the DMZ net over 100mbit connections.
It may not be a panacea, but it's cheep, and can run on an antiquated system. I'm using a P-100 with 4 PCI slots and 24M Ram, and a 100MB HD.
As I said, I only had a couple of ports open, Auth(113) is one of them...
I know the probes I've see on my system come from many different systems, and only probe one port per system probing. I'd like to know if X system is doing probes against other systems.
Try running ANYTHING on an NT box for a week. .doc even though I had 1gig of space!
And then tell me if you had no problems. I will buy an NT license the next day!
Btw, anything is anything worthy of running.
I have not seen Office run all day without by end of day tell me I don't have enough space to save a 10 page
The kernel needs a Gtk/Gnome-based post-install device configuration tools "a la" make xconfig. (Better sig coming soon
I don't know if you would believe it. BUT I don't hate MS. I hate Windows. I love Visual Studio. I hate Windows, 95/98/NT. They aren't reliable. PERIOD>
The kernel needs a Gtk/Gnome-based post-install device configuration tools "a la" make xconfig. (Better sig coming soon
Do the following:
(1) Erase disk!
(2) Make a clean install of Windows.
(3) Download Regclean.
(4) Run Regclean
(5) Fix Registry errors!!!!!!!!!!!!!!
The kernel needs a Gtk/Gnome-based post-install device configuration tools "a la" make xconfig. (Better sig coming soon
yeah...Richard Stallman would :)
be pissed at the idea that Linus
invented the OSS concept.
Someone didn't read that article posted
this weekend of his interview!
Perhaps a general intrusion detection system would be a good approach if you're concerned that it might be script kiddies. ISS makes a good one, but then I'm biased :-) Network Flight Recorder would probably also be a good one though I have no direct experience with them.
I think that you can download an evaluation copy of ISS' RealSecure from http://www.iss.net. Or,
NFR is at http://www.nfr.com. They say that they have eval copies for download.
Good Luck
Yep, he'll just shoot himself... :P
:)
His problems will ALL go away
Actually, NT is convenient... With loads like Slashdot it reboots all by itself every few hours... So it eliminates these problems
Just as any follow the leader organization, RH has mirrored the post on it's front page....
...some days you're the dog, some days you're the hydrant...
where do we send resumes? :>
he who has the fastest cart always has the best lie.