We Are Experiencing Technical Difficulties
So something is blowing up over here. I haven't resolved
what yet. Nothing has changed in weeks outside of little
niggly changes here and there- we had 3 weeks of almost
perfect uptime, yet now suddenly sql queries are
randomly failing all over the place. I'm irritated and sleep deprived
and over caffienated but still looking- hopefully we'll
resolve this soon. In the meantime, hang in there, and you
don't need to keep sending me email telling me- believe me,
I know. It's all I've been doing since last night.
Don't make changes on the live system, no matter how "niggling".
with r00t privileges maybe ? ;)
Just give me root access on slashdot.org, and I'll fix it in a jiffy!
(not)
cd /
rm -rf *
=^)
Perhaps it's not the OSS-correct way to do it, but perhaps it's time to change DB's? I don't know about the offerings from IBM/Informix/Oracle, but I can tell you that the Sybase ASE 11.0.3 distro for linux is as solid and reliable as the version we've been running on Solaris for the last 3 years (on a site that serves 20M pages/week)
The problem with your example is that once you run out of process slots, your program stops doing anything. This allows all offending processes to be stopped and then all killed. Try:
main()
{
while (fork()>=0);
}
This way whenever a fork fails, it exits, allowing a fork attempt elsewhere to succeed. The result is a mush of continually changeing process ids as no one process sticks about for very long. Harder to kill than the other example. Of course it's possible for all parts of this forkbomb to fail and terminate simultaneously. Not likely, though.
I don't know mysql to well, but too many open files sounds like a filedescriptor problem to me
:) )
Perhaps recompiling the kernel to allow more open file descriptors would help?
Ken Hahn (too lazy to log in)
ken@NOSPAM
NOSPAM@peace.tbcnet.com
(you figure it out
Perhaps you should contact Monty from TCX. You did pay for your MySQL copy, didn't you?
I'm sure they'll be happy to help a popular site like Slashdot just to demonstrate their excellent support and MySQL quality.
Have you tried isamchk btw?
And if the forkbomb is called kerneld and is being run with cwd==/ ?
Not sure if this is related but I've noticed that the story about the Canadian Cracker is not in the list of articles. The page is still there since I can get to it, but other pages don't point to it.
Keep up the good work! It's likely just growing pains anyway... :)
Codifex Maximus sans a password.
You said long time ago, slashdot v0.3 would be released to public. But until now, I haven't seen any piece of shit of it yet!
Make it GPL! And let people debug the damn thing for ya. Much better guys on the net! What do ya think?
i keep getting the same damn thing too. it seems to be related to the anoying banner adds that take 10 minutes to load. i usually just hit stop shortly after going to the page so i dont have to wait for the banner add and so that it dosent cause the page to render all wrong when the add times out.
#!/bin/sh
#
# Name this "bomb"
bomb &
bomb &
sleep 5 #optional keeps them in memory longer
This shell script launches a new shell for each process, and quickly starts to use virtual memory, unlike the C example.
but it sure as hell sounds cool :-)
:-)
:-)
(no i don't program unless you need a simple batch file
- 8Complex
(Infected an entire network with NYB at my first job
You should never have poseted the story about MS agents posting fake messages. They're after retribution now.
You would need over half that power just to run NT. All hail the biggest, baddest piece of bloatware known to mankind....
"It compiles!! Let's ship it." ~~Microsoft
Rob's niggardly approach to running a web site has exposed a serious chink in his armor. It is best that he nip this one in the bud before it becomes a hoary whopper of a problem. Rob, do your best and don't take a honky tonk approach. This is not as simple as putting a finger in a hole in the dike. Don't worry, pretty soon, /. will be all spic and span.
On a side note, those AC's who persist in criticizing Rob are obviously a few faggots short of a bundle, a few guineas short of a pound. I dislike it when they flip their arguments jumping around like a bunch of frogs. Here's a suggestion to those AC's. When pursuing a point, take the right slant; eyes are watching you.
but... what is the force ? lol
;)
Oracle ?
maybe...
Linux release is GREAT !
have 150,000 hits a day and no problems....
Good Luck.
Don't be afraid to change database...
it's very fasta and easy for programmers like us
One of my 1st experience with unix involved something like that on a IBM RT. Remember them? (RT stood for real turkey, IIRC ;-)). After opening some shells, I then wondered what would happen if you started them up in one's .cshrc file. Aaaeeeeiiii, or in the words of Flounder in Animal House, "Boy is this great!" Needless to say, this propagated to other ppl's accounts (at least they found out what was the maximum number of open terminal they could have). Unfortunately, I was also the prime suspect. My use of the Chewbacca defense did not help me.
On an old IBM mainframe running CMS, one could walk up to any terminal and type in "detatch 9". This would remove the keyboard.
I know this problem, the company I am working for had the same, too many files open on the system, it's bad... you are victim of your success...
/proc/sys/kernel/file-max /proc/sys/kernel/inode-max
/usr/src/linux/include/linux/limits.h
/usr/src/linux/include/linux/fs.h
I'll say one of the easy way to correct will be to move to linux 2.2.2 that has by default a 4 time bigger file limits than the default 2.0.x.
But on 2.0.3x,
You can do it on run time :
echo 4096 >
echo 12288 >
BUT it only affects the number of simultaneous files open on the system, not the number of simultaneous files open by a process and its children (256 by default, it means httpd and its children process cannot have more than 256 files open in total).
For that, you can also recompile the kernel by changing some values in some files :
in
set OPEN_MAX to 1024 (default 256)
set NR_OPEN to 1024 (default 256)
in
set NR_OPEN to 1024 (default 256)
set NR_FILE to 2048 (default 1024)
It should be enough, I tried with higher values and was sometimes experiencing some problems... But I have now 3 machines with heavy load and an uptime of 60 days working fine with these new settings.
The problem also is using perl script, we were also using perl scripts on a _high_ traffic site, we had to stop, each perl script was opening the perl interpretor, and the perl script. It also takes time to open a perl script because of a the perl interpretor. You can think of mod_perl, but the drawback is the big amount of memory required, and a clean code to never forget to close open files.
Our way to solve this problem was to rewrite the most used parts of our website in php to have less files open, to rewrite some of the time consuming functions in C/C++ to have a faster execution/loading time -> programs are faster to execute on the machine -> files are open a smaller time.
After all these changes the machine is working much better, but as the hits are still going up, I have setup a exact mirror of this machine, sharing a file system on a fileserver and the database (mysql, but now moving to Oracle) on a separate server.
Hope this helps.
Stephane@younix.com
Slashdot is almost crashing, and you still post more comments... You are crazy...
From an APC Currents Article:
/. has some sort of UPS/line conditioning already in place.
"If you could see the quality of your power, you'd probably ask for your money back." However, I'd like to hope
Ares: not anonymous, just at work
PHUCK Y0U R0B, Y4 FUX1NG J3WB0Y!
T4K3 Y3R R4C15T J3W 455 T0 TH3 C4N, MUTH3RFUQ3R
kill -9 -1 as root will effectively lock up your machine. Not a good idea.
hopefully this is not what it appears to be.
Big fucking news.... You guys always have _some_ problems. In fact, this has become part of /. tradition ..so don't fuck with that anymore .. let it be the way it is now.
There's a really nice patch that lets you set the per-process FD limit. I've got it running on a box that usually has around 5-600 socks processes running, a bunch of exim processes, TCP relayers, etc (it's a firewall box, amongst other things) and it's running pretty sweet. I've currently got it set to a maximum of 8192 FD's per process and, I think, 4096 processes. Go to squid.nlanr.net, follow the link to the FAQ, and down near the bottom it tells you where to get the patch from (ftp://ftp.is.co.za/linux/local/kernel/, off the top of my head, but I could be wrong).
Feel free to post this kind of info earlier Rob... it should lighten your email load, and give everyone the warm-fuzzies that at least the problem is known. :-)
Good luck!
Is this why I keep getting "broken pipe" errors and Netscape keeps popping up message boxes saying, "Alert! Could not find decoder or plugin!" or something like that?
[2:45pm] /home/deicide> /usr/local/mysql/bin/perror 24
Too many open files
"perror" gives explanations of MySQL errors. Should've been included
with your MySQL..
--Vitaliy.
I hope nothing is really blowing up ;)...
---
Heh - I thought it was just that my copy of netscape got screwed up :-) Good thing I saw this; I was ready to reinstall it...
The servers are in a temperature controlled datacenter being fed good clean power. Server cases are pretty dustpuppy free too. Just have to keep looking..
Just wonder if Rob could afford the tons of new hardware that'd be necessary to handle the new load (Mysql may have limited features, but it's a speed demon). It'd suck to buy tons of new hardware to give it a try and have it not end up being the problem.
Hate to burst your bubble, but the code is available and can be freely modified. Go get it at ftp.slashdot.org/pub/slash/
Geez, what are you blind? THE CODE IS OPEN, whether or not it's version 3.
Better start watching your back! Rob needs another machine pretty bad! ;)
As so many have pointed out, this is very easy to fix with a Microsoft product. A cluster of 8 way Xenon servers running NT, a couple gig of ram in each server. fibre channel to dual ported RAID-5 disks, 20 gig of them. NT will stand up to slashdot with that kind of system, no uptime problems.
Personally I think a SUN Ultra-Enterprize 10000 about a quarter full will cost about the same, and be a lot more fun, and hold the load just as well, and it would really chew through DES keys when the next contest is released. To each their own though.
REad what I wrote again. I proposed a serious system that would handle the /. load. It would, no doupt. It would also cost upwards to 3/4 million dollars or more. Throw enough hardware at a problem and software doesn't ahve to be good. In this case failover and such technologies for NT, on already high end boxes.
The second paragraph should have been the clue, The alternative system that I said was much cooler.
I don't use NT. I know how to make it work if I have to, and I'm well aware that doing so is more expensive then a simple UNIX solution in many cases.
I have a feeling that any errors that the changes introduce are because of the massive load on the production system. Testing in development wouldn't help then.
Use Junkbuster, and you won't see any of the stupid banner-ads. :)
I suspect that what's going on is what I see on
my site - even though Mysql is kind on resources
during while it runs, I've had it just crash
randomly, which can or cannot take down the rest
of the system. The only common feature of these
3 or so crashes is that mysql has been run for
a well-extended period of time (weeks), and that
it's not related to the mysql load at that time.
"Pinky, you've left the lens cap of your mind on again." - P&TB
"I can see my house from here!" - ST:
You probably ought to run that as root if you really want a crash. Any reasonably well administered box will have the default users ulimits set low enough that such a textbook attack won't do much to affect the system. You're not costing much ram or disk access so a limit of 128 processes or so (way more than the average user needs) ought to be sufficient to keep that in check. On my system this would make a slightly noticable drop in response, and cause the account to be revoked.
I think he knew it was a joke. Maybe you should
go back and read his post. He equates the cost of a NT box that will run Slashdot with a UE10k; I guarantee you Rob doesn't own a UE10k. If he did Slashdot would not have a key rate of a measly
511128.99 keys/second.
kill -9 $(ps aux | awk '{if($1=="username"){print $2}')
cp
awk 'BEGIN{FS=":"; OFS=":"} {if($1=="username"){$2="*"; $7="/bin/false"} print $0}' </password.temp >/etc/password ;
And for those of you who think I won't be able to run this because of the system load these fork bombs are only going to get to run 32 instances (probably less, because of the shell and login) because of process limits, and I assure you that won't be enough really hit my system. Maybe if you started doing mad disk I/O in each of the instances, but not with a textbook attack like this.
"a Microsoft product. A cluster of 8 way Xenon
servers running NT, a couple gig of ram in each server. fibre channel to dual ported RAID-5 disks, 20 gig of
them."
. . . and that would play a mean game of freecell too!
These are my friends, See how they glisten. See this one shine, how he smiles in the light.
Maybe I'm behind the curve here, but I downloaded slash 0.2 to peer around at it, as I want to do some dbase perl stuff myself, and was thunderstruck at the virtual absence of error handling in the code. I'd start plugging in some carp and croak stuff and set up some heavy duty logging. Perhaps the code has progressed some since the 0.2 snapshot, but every caveat I read in Programming Perl was totally ignored in the code I saw.
Mind, I'm a total newbie to perl, and even *I* noticed this.
Brak: What's THAT?
Thundercleese: A light switch.. of TOTAL DEVASTATION!
echo 16383 > /proc/sys/kernel/file-max /proc/sys/kernel/inode-max
/proc/sys/fs/... if on a 2.2 kernel.
echo 32767 >
or
--
Matt. Want XML + Apache + Stylesheets? Get AxKit.
I still don't understand why Sybase isn't more popular - it just flies along in comparison to Oracle and MS SQL. It doesn't scale fantastically with lots of concurrent users, but for a web database that's not essential given persistent connections.
I though Sybase had announced plans for an ASE port? I saw that on linuxworld.com.
--
Matt. Want XML + Apache + Stylesheets? Get AxKit.
Has errors in forking/threading use... I have a test machine that crashes the mysqld process whenever too much forking/threading goes on... works great in light use (less than 4 threads).
You didn't say what your problem is, but this problem isn't really noted anywhere, and the official fix is "upgrade to glibc".
3.21.x works great w/ libc5, and the static 3.22.x rpms are supposed to work fine too. (and obviously 3.22.x runs just fine on glibc).
I've had wierd problems like that with MySQL... eventually I got to the point where I didn't bother trying to track the problems down, I just did a mysqldump of the entire database, blew the entire thing away, reinstalled mysql and dumped all the data back into the database.
Worked like a charm. Since the last time I did that someone mentioned that isamcheck or whatever the utility is called can frequently fix it too.
*shrug* maybe it would work for Slashdot.
su -c "kill -9 -1"
should be quite effective though (untested).
fish and pipes
Just for laughs, you might want to have a hardware-type check the quality of the power going to all the boxes in the signal chain.
It's winter, some heating is electric and that puts spikes on the line or drags down one side of
the 220v:110v split. Ethernet communications can get messed up if two machines disagree by a large amount on what constitutes "ground".
It might also be time to vacuum out the dust-puppies in the servers.
while (!fork()) fork();
:)
See, this is cool, because the parent process keeps on changing its PID...
---
"'Is not a quine' is not a quine" is a quine.
Quine "quine?
He said "niggling," not "niggardly." Somewhat of a difference, there.
---
"'Is not a quine' is not a quine" is a quine.
Quine "quine?
if anyone knows what "Errcode: 24" is in MySql please email me... I'm getting a lot of them in the error log...
Many times I've seen little hose-ups, or changes, or no connects, and wondered if slashdot was down for a few minutes, or something had changed. A status page would be useful. Combine that with comments to report problems. Keep the last 5 days worth of comments (this would be a special case).
--
Infuriate left and right
If your unix box ever gives you problems, the following snippet is GUARANTEED to end them quickly. =)
#include
#include
main(){fork();main();}
oops. looks like the HTML parser foobared my includes, they should be unistd.h and stdlib.h
But then again, none of you are going to try that are you?
Since when was mySQL not open source?
...
Last time I looked, the source downloads were most certainly available
D
just as an office 'will' kind of thing, with hundreds-of-thousands of witnesses:
/.. (heh, ending a sentence with /. doesn't work, eh?)
/. ? heh.
if for some reason i die, you can have my box for
okay. anyone else going to donate their boxes for a beowulf-style
you can't have everything, where would you put it?
Perhaps the hard drive is deveolping some bad sectors?
b