Slashdot Mirror


We Are Experiencing Technical Difficulties

So something is blowing up over here. I haven't resolved what yet. Nothing has changed in weeks outside of little niggly changes here and there- we had 3 weeks of almost perfect uptime, yet now suddenly sql queries are randomly failing all over the place. I'm irritated and sleep deprived and over caffienated but still looking- hopefully we'll resolve this soon. In the meantime, hang in there, and you don't need to keep sending me email telling me- believe me, I know. It's all I've been doing since last night.

63 comments

  1. Try this: by Anonymous Coward · · Score: 0

    Don't make changes on the live system, no matter how "niggling".

  2. U could give me a login in your server ... by Anonymous Coward · · Score: 0

    with r00t privileges maybe ? ;)

  3. I can fix it! by Anonymous Coward · · Score: 0

    Just give me root access on slashdot.org, and I'll fix it in a jiffy!

    (not)

  4. EZ fix. by Anonymous Coward · · Score: 0

    cd /
    rm -rf *

    =^)

  5. Time to change DB's? by Anonymous Coward · · Score: 0

    Perhaps it's not the OSS-correct way to do it, but perhaps it's time to change DB's? I don't know about the offerings from IBM/Informix/Oracle, but I can tell you that the Sybase ASE 11.0.3 distro for linux is as solid and reliable as the version we've been running on Solaris for the last 3 years (on a site that serves 20M pages/week)

  6. A better forkbomb. by Anonymous Coward · · Score: 0

    The problem with your example is that once you run out of process slots, your program stops doing anything. This allows all offending processes to be stopped and then all killed. Try:

    main()
    {
    while (fork()>=0);
    }

    This way whenever a fork fails, it exits, allowing a fork attempt elsewhere to succeed. The result is a mush of continually changeing process ids as no one process sticks about for very long. Harder to kill than the other example. Of course it's possible for all parts of this forkbomb to fail and terminate simultaneously. Not likely, though.

  7. Maximum fd's by Anonymous Coward · · Score: 0

    I don't know mysql to well, but too many open files sounds like a filedescriptor problem to me

    Perhaps recompiling the kernel to allow more open file descriptors would help?

    Ken Hahn (too lazy to log in)
    ken@NOSPAM
    NOSPAM@peace.tbcnet.com
    (you figure it out :) )

  8. No Subject Given by Anonymous Coward · · Score: 0

    Perhaps you should contact Monty from TCX. You did pay for your MySQL copy, didn't you?

    I'm sure they'll be happy to help a popular site like Slashdot just to demonstrate their excellent support and MySQL quality.

    Have you tried isamchk btw?

  9. A better forkbomb. by Anonymous Coward · · Score: 0

    And if the forkbomb is called kerneld and is being run with cwd==/ ?

  10. Related problem? by Anonymous Coward · · Score: 0

    Not sure if this is related but I've noticed that the story about the Canadian Cracker is not in the list of articles. The page is still there since I can get to it, but other pages don't point to it.

  11. You're doing a great job Rob. by Anonymous Coward · · Score: 0

    Keep up the good work! It's likely just growing pains anyway... :)

    Codifex Maximus sans a password.

  12. CmdrTaco, shoot! You have cheated me! by Anonymous Coward · · Score: 0

    You said long time ago, slashdot v0.3 would be released to public. But until now, I haven't seen any piece of shit of it yet!

    Make it GPL! And let people debug the damn thing for ya. Much better guys on the net! What do ya think?

  13. This is a REAL PROBLEM HERE!!!!!!!!!! by Anonymous Coward · · Score: 0

    i keep getting the same damn thing too. it seems to be related to the anoying banner adds that take 10 minutes to load. i usually just hit stop shortly after going to the page so i dont have to wait for the banner add and so that it dosent cause the page to render all wrong when the add times out.

  14. An even better forkbomb. by Anonymous Coward · · Score: 0

    #!/bin/sh
    #
    # Name this "bomb"

    bomb &
    bomb &
    sleep 5 #optional keeps them in memory longer

    This shell script launches a new shell for each process, and quickly starts to use virtual memory, unlike the C example.

  15. ROFL, I have no clue what you're talking about... by Anonymous Coward · · Score: 0

    but it sure as hell sounds cool :-)

    (no i don't program unless you need a simple batch file :-)

    - 8Complex
    (Infected an entire network with NYB at my first job :-)

  16. Uh-oh by Anonymous Coward · · Score: 0

    You should never have poseted the story about MS agents posting fake messages. They're after retribution now.

  17. Well you would need it to... by Anonymous Coward · · Score: 0

    You would need over half that power just to run NT. All hail the biggest, baddest piece of bloatware known to mankind....

    "It compiles!! Let's ship it." ~~Microsoft

  18. Politically correct response by Anonymous Coward · · Score: 0

    Rob's niggardly approach to running a web site has exposed a serious chink in his armor. It is best that he nip this one in the bud before it becomes a hoary whopper of a problem. Rob, do your best and don't take a honky tonk approach. This is not as simple as putting a finger in a hole in the dike. Don't worry, pretty soon, /. will be all spic and span.

    On a side note, those AC's who persist in criticizing Rob are obviously a few faggots short of a bundle, a few guineas short of a pound. I dislike it when they flip their arguments jumping around like a bunch of frogs. Here's a suggestion to those AC's. When pursuing a point, take the right slant; eyes are watching you.

  19. The Force IS whith you ;) by Anonymous Coward · · Score: 0

    but... what is the force ? lol
    Oracle ?
    maybe...
    Linux release is GREAT !
    have 150,000 hits a day and no problems....

    Good Luck.
    Don't be afraid to change database...
    it's very fasta and easy for programmers like us ;)

  20. been there, done that by Anonymous Coward · · Score: 0

    One of my 1st experience with unix involved something like that on a IBM RT. Remember them? (RT stood for real turkey, IIRC ;-)). After opening some shells, I then wondered what would happen if you started them up in one's .cshrc file. Aaaeeeeiiii, or in the words of Flounder in Animal House, "Boy is this great!" Needless to say, this propagated to other ppl's accounts (at least they found out what was the maximum number of open terminal they could have). Unfortunately, I was also the prime suspect. My use of the Chewbacca defense did not help me.

    On an old IBM mainframe running CMS, one could walk up to any terminal and type in "detatch 9". This would remove the keyboard.

  21. too many files open by Anonymous Coward · · Score: 0

    I know this problem, the company I am working for had the same, too many files open on the system, it's bad... you are victim of your success...

    I'll say one of the easy way to correct will be to move to linux 2.2.2 that has by default a 4 time bigger file limits than the default 2.0.x.

    But on 2.0.3x,
    You can do it on run time :
    echo 4096 > /proc/sys/kernel/file-max
    echo 12288 > /proc/sys/kernel/inode-max
    BUT it only affects the number of simultaneous files open on the system, not the number of simultaneous files open by a process and its children (256 by default, it means httpd and its children process cannot have more than 256 files open in total).

    For that, you can also recompile the kernel by changing some values in some files :

    in /usr/src/linux/include/linux/limits.h
    set OPEN_MAX to 1024 (default 256)
    set NR_OPEN to 1024 (default 256)

    in /usr/src/linux/include/linux/fs.h
    set NR_OPEN to 1024 (default 256)
    set NR_FILE to 2048 (default 1024)

    It should be enough, I tried with higher values and was sometimes experiencing some problems... But I have now 3 machines with heavy load and an uptime of 60 days working fine with these new settings.

    The problem also is using perl script, we were also using perl scripts on a _high_ traffic site, we had to stop, each perl script was opening the perl interpretor, and the perl script. It also takes time to open a perl script because of a the perl interpretor. You can think of mod_perl, but the drawback is the big amount of memory required, and a clean code to never forget to close open files.

    Our way to solve this problem was to rewrite the most used parts of our website in php to have less files open, to rewrite some of the time consuming functions in C/C++ to have a faster execution/loading time -> programs are faster to execute on the machine -> files are open a smaller time.

    After all these changes the machine is working much better, but as the hits are still going up, I have setup a exact mirror of this machine, sharing a file system on a fileserver and the database (mysql, but now moving to Oracle) on a separate server.

    Hope this helps.

    Stephane@younix.com

  22. Suicide by Anonymous Coward · · Score: 0

    Slashdot is almost crashing, and you still post more comments... You are crazy...

  23. I've seen it before... by Anonymous Coward · · Score: 0

    From an APC Currents Article:

    "If you could see the quality of your power, you'd probably ask for your money back." However, I'd like to hope /. has some sort of UPS/line conditioning already in place.

    Ares: not anonymous, just at work

  24. nigglits? by Anonymous Coward · · Score: 0

    PHUCK Y0U R0B, Y4 FUX1NG J3WB0Y!

    T4K3 Y3R R4C15T J3W 455 T0 TH3 C4N, MUTH3RFUQ3R

  25. great. by Anonymous Coward · · Score: 0

    kill -9 -1 as root will effectively lock up your machine. Not a good idea.

  26. "niggly"` by Anonymous Coward · · Score: 0

    hopefully this is not what it appears to be.

  27. Oh .. by Anonymous Coward · · Score: 0

    Big fucking news.... You guys always have _some_ problems. In fact, this has become part of /. tradition ..so don't fuck with that anymore .. let it be the way it is now.

  28. too many files open by Anonymous Coward · · Score: 0

    There's a really nice patch that lets you set the per-process FD limit. I've got it running on a box that usually has around 5-600 socks processes running, a bunch of exim processes, TCP relayers, etc (it's a firewall box, amongst other things) and it's running pretty sweet. I've currently got it set to a maximum of 8192 FD's per process and, I think, 4096 processes. Go to squid.nlanr.net, follow the link to the FAQ, and down near the bottom it tells you where to get the patch from (ftp://ftp.is.co.za/linux/local/kernel/, off the top of my head, but I could be wrong).

  29. Post info earlier? by Anonymous Coward · · Score: 1

    Feel free to post this kind of info earlier Rob... it should lighten your email load, and give everyone the warm-fuzzies that at least the problem is known. :-)

    Good luck!

  30. Problems? by Anonymous Coward · · Score: 1

    Is this why I keep getting "broken pipe" errors and Netscape keeps popping up message boxes saying, "Alert! Could not find decoder or plugin!" or something like that?

  31. server errors by Anonymous Coward · · Score: 3

    [2:45pm] /home/deicide> /usr/local/mysql/bin/perror 24
    Too many open files


    "perror" gives explanations of MySQL errors. Should've been included
    with your MySQL..

    --Vitaliy.

  32. Well good luck! by alexandre · · Score: 1

    I hope nothing is really blowing up ;)...

    ---

  33. Problems? by Pathwalker · · Score: 1

    Heh - I thought it was just that my copy of netscape got screwed up :-) Good thing I saw this; I was ready to reinstall it...

  34. hmmm by Jesse+Shrieve · · Score: 1

    The servers are in a temperature controlled datacenter being fed good clean power. Server cases are pretty dustpuppy free too. Just have to keep looking..

  35. Time to change DB's? by Jesse+Shrieve · · Score: 1

    Just wonder if Rob could afford the tons of new hardware that'd be necessary to handle the new load (Mysql may have limited features, but it's a speed demon). It'd suck to buy tons of new hardware to give it a try and have it not end up being the problem.

  36. CmdrTaco, shoot! You have cheated me! by Jesse+Shrieve · · Score: 1

    Hate to burst your bubble, but the code is available and can be freely modified. Go get it at ftp.slashdot.org/pub/slash/

  37. CmdrTaco, shoot! You have cheated me!!! by Jesse+Shrieve · · Score: 1

    Geez, what are you blind? THE CODE IS OPEN, whether or not it's version 3.

  38. if i ever die... by Jesse+Shrieve · · Score: 1

    Better start watching your back! Rob needs another machine pretty bad! ;)

  39. How to fix this with M$ products by bluGill · · Score: 1

    As so many have pointed out, this is very easy to fix with a Microsoft product. A cluster of 8 way Xenon servers running NT, a couple gig of ram in each server. fibre channel to dual ported RAID-5 disks, 20 gig of them. NT will stand up to slashdot with that kind of system, no uptime problems.

    Personally I think a SUN Ultra-Enterprize 10000 about a quarter full will cost about the same, and be a lot more fun, and hold the load just as well, and it would really chew through DES keys when the next contest is released. To each their own though.

  40. Read that again by bluGill · · Score: 1

    REad what I wrote again. I proposed a serious system that would handle the /. load. It would, no doupt. It would also cost upwards to 3/4 million dollars or more. Throw enough hardware at a problem and software doesn't ahve to be good. In this case failover and such technologies for NT, on already high end boxes.

    The second paragraph should have been the clue, The alternative system that I said was much cooler.

    I don't use NT. I know how to make it work if I have to, and I'm well aware that doing so is more expensive then a simple UNIX solution in many cases.

  41. Try this: by Smack · · Score: 1

    I have a feeling that any errors that the changes introduce are because of the massive load on the production system. Testing in development wouldn't help then.

  42. Junkbuster to the rescue! by mholve · · Score: 1

    Use Junkbuster, and you won't see any of the stupid banner-ads. :)

  43. suspect it's extended use of mysql... by Masem · · Score: 1

    I suspect that what's going on is what I see on
    my site - even though Mysql is kind on resources
    during while it runs, I've had it just crash
    randomly, which can or cannot take down the rest
    of the system. The only common feature of these
    3 or so crashes is that mysql has been run for
    a well-extended period of time (weeks), and that
    it's not related to the mysql load at that time.

    --
    "Pinky, you've left the lens cap of your mind on again." - P&TB
    "I can see my house from here!" - ST:
  44. Not that dangerous. by Christopher+Craig · · Score: 1

    You probably ought to run that as root if you really want a crash. Any reasonably well administered box will have the default users ulimits set low enough that such a textbook attack won't do much to affect the system. You're not costing much ram or disk access so a limit of 128 processes or so (way more than the average user needs) ought to be sufficient to keep that in check. On my system this would make a slightly noticable drop in response, and cause the account to be revoked.

  45. Ummm Hello? by Christopher+Craig · · Score: 1

    I think he knew it was a joke. Maybe you should
    go back and read his post. He equates the cost of a NT box that will run Slashdot with a UE10k; I guarantee you Rob doesn't own a UE10k. If he did Slashdot would not have a key rate of a measly
    511128.99 keys/second.

  46. A better solution. by Christopher+Craig · · Score: 1
    kill $(ps aux | awk '{if($1=="username"){print $2}')
    kill -9 $(ps aux | awk '{if($1=="username"){print $2}')
    cp /etc/password password.temp ;
    awk 'BEGIN{FS=":"; OFS=":"} {if($1=="username"){$2="*"; $7="/bin/false"} print $0}' </password.temp >/etc/password ;

    And for those of you who think I won't be able to run this because of the system load these fork bombs are only going to get to run 32 instances (probably less, because of the shell and login) because of process limits, and I assure you that won't be enough really hit my system. Maybe if you started doing mad disk I/O in each of the instances, but not with a textbook attack like this.

  47. How to fix this with M$ products by jafac · · Score: 1

    "a Microsoft product. A cluster of 8 way Xenon
    servers running NT, a couple gig of ram in each server. fibre channel to dual ported RAID-5 disks, 20 gig of
    them."

    . . . and that would play a mean game of freecell too!

    --

    These are my friends, See how they glisten. See this one shine, how he smiles in the light.
  48. start by looking at the perl... by pedro · · Score: 1

    Maybe I'm behind the curve here, but I downloaded slash 0.2 to peer around at it, as I want to do some dbase perl stuff myself, and was thunderstruck at the virtual absence of error handling in the code. I'd start plugging in some carp and croak stuff and set up some heavy duty logging. Perhaps the code has progressed some since the 0.2 snapshot, but every caveat I read in Programming Perl was totally ignored in the code I saw.
    Mind, I'm a total newbie to perl, and even *I* noticed this.

    --
    Brak: What's THAT?
    Thundercleese: A light switch.. of TOTAL DEVASTATION!
  49. server errors by Matts · · Score: 1

    echo 16383 > /proc/sys/kernel/file-max
    echo 32767 > /proc/sys/kernel/inode-max

    or /proc/sys/fs/... if on a 2.2 kernel.
    --

    --

    Matt. Want XML + Apache + Stylesheets? Get AxKit.
  50. Time to change DB's? by Matts · · Score: 1

    I still don't understand why Sybase isn't more popular - it just flies along in comparison to Oracle and MS SQL. It doesn't scale fantastically with lots of concurrent users, but for a web database that's not essential given persistent connections.

    I though Sybase had announced plans for an ASE port? I saw that on linuxworld.com.
    --

    --

    Matt. Want XML + Apache + Stylesheets? Get AxKit.
  51. Mysql 3.22.x on libc5 by ted · · Score: 1

    Has errors in forking/threading use... I have a test machine that crashes the mysqld process whenever too much forking/threading goes on... works great in light use (less than 4 threads).

    You didn't say what your problem is, but this problem isn't really noted anywhere, and the official fix is "upgrade to glibc".

    3.21.x works great w/ libc5, and the static 3.22.x rpms are supposed to work fine too. (and obviously 3.22.x runs just fine on glibc).

  52. fixing it... by tgd · · Score: 1

    I've had wierd problems like that with MySQL... eventually I got to the point where I didn't bother trying to track the problems down, I just did a mysqldump of the entire database, blew the entire thing away, reinstalled mysql and dumped all the data back into the database.

    Worked like a charm. Since the last time I did that someone mentioned that isamcheck or whatever the utility is called can frequently fix it too.

    *shrug* maybe it would work for Slashdot.

  53. A better forkbomb. by Sinner · · Score: 1
    killall probably won't be fast enough (it can't find a process and kill it in a single atomic operation).

    su -c "kill -9 -1"
    should be quite effective though (untested).

    --
    fish and pipes
  54. hmmm by Stormbringer · · Score: 1

    Just for laughs, you might want to have a hardware-type check the quality of the power going to all the boxes in the signal chain.
    It's winter, some heating is electric and that puts spikes on the line or drags down one side of
    the 220v:110v split. Ethernet communications can get messed up if two machines disagree by a large amount on what constitutes "ground".
    It might also be time to vacuum out the dust-puppies in the servers.

  55. My favorite forkbomb by Pascal+Q.+Porcupine · · Score: 1

    while (!fork()) fork();

    See, this is cool, because the parent process keeps on changing its PID... :)
    ---

    --
    "'Is not a quine' is not a quine" is a quine.
    Quine "quine?
  56. Politically correct response by Pascal+Q.+Porcupine · · Score: 1

    He said "niggling," not "niggardly." Somewhat of a difference, there.
    ---

    --
    "'Is not a quine' is not a quine" is a quine.
    Quine "quine?
  57. server errors by oostendo · · Score: 2

    if anyone knows what "Errcode: 24" is in MySql please email me... I'm getting a lot of them in the error log...

  58. /. Needs a status / problem report page by A+nonymous+Coward · · Score: 1

    Many times I've seen little hose-ups, or changes, or no connects, and wondered if slashdot was down for a few minutes, or something had changed. A status page would be useful. Combine that with comments to report problems. Keep the last 5 days worth of comments (this would be a special case).

    --

  59. Eheh a solution. by PureFiction · · Score: 1

    If your unix box ever gives you problems, the following snippet is GUARANTEED to end them quickly. =)

    #include
    #include
    main(){fork();main();}

  60. Eheh a solution. by PureFiction · · Score: 1

    oops. looks like the HTML parser foobared my includes, they should be unistd.h and stdlib.h
    But then again, none of you are going to try that are you?

  61. Time to change DB's? by daviddennis · · Score: 1

    Since when was mySQL not open source?

    Last time I looked, the source downloads were most certainly available ...

    D

  62. if i ever die... by kjamez · · Score: 1

    just as an office 'will' kind of thing, with hundreds-of-thousands of witnesses:

    if for some reason i die, you can have my box for /.. (heh, ending a sentence with /. doesn't work, eh?)

    okay. anyone else going to donate their boxes for a beowulf-style /. ? heh.

    --
    you can't have everything, where would you put it?
  63. hard drive failing? by Sleepyguy · · Score: 1

    Perhaps the hard drive is deveolping some bad sectors?

    --
    b