Slashdot Mirror


Software To Diagnose Faulty PC Hardware?

Etylowy writes "Over the years I have repaired my own PC and those belonging to family and friends many, many times. While in most cases it turned out to be restoring a system after malware/the user/Windows made a mess, or simple cases of 'follow the smell of smoke and molten plastic,' there were some nasty ones where the computer mostly works. By 'mostly,' I mean: you can boot it up, it might even work for a while, but will crash way too often to blame it all on Microsoft — what do you do then? Once you strip it of any extra hardware (which, with today's motherboards that have pretty much everything integrated, might not be an option) you are left with the CPU, motherboard, graphics card, RAM and HDD. You can test the HDD, you can run memtest86+ to check the RAM, but how do you go about testing the CPU, motherboard and graphics card trio to find which is to blame? Replacing them one by one isn't really an option. Do you know of any software that would help the way memtest helps with RAM?"

31 of 274 comments (clear)

  1. Eurosoft PC Check by jdb2 · · Score: 4, Informative

    This is probably one of the best and most comprehensive OS agnostic boot-CD/floppy general purpose PC hardware testing and burn-in tools I've come across IMHO.

    Here's its web page : http://www.eurosoft-uk.com/pc_check.htm

    In any case, I recommend plugging the ATX cable into a power supply tester that presents a non-trivial load as a first step in diagnosing any PC. You'd be surprised in what ways the problems caused by out-of-spec voltages can be manifested.

    jdb2

    1. Re:Eurosoft PC Check by Omnifarious · · Score: 2, Informative

      I second this. I've had 2 or 3 PCs now that have begun acting very strangely only to discover that the real problem was the power supply. Replace it and the PC acts fine again.

    2. Re:Eurosoft PC Check by piero.grimo · · Score: 2, Informative

      Same here. I've consistently had problems with a PC to discover years later that the PSU was defective (it actually blew up). I got a 450W PSU and all the bizarre symptoms have vanished.

    3. Re:Eurosoft PC Check by Artifakt · · Score: 5, Informative

      Every power supply which I've found failed was visibly broken once you opened it up, and it was always the capacitors. No Exceptions - capacitors had sprayed gunk all over, their Aluminium cans had popped off the bases, etc. Typical electrolytic fluid is white-ish, but once it bakes dry will scorch, and so gradually turn reddish brown. Many capacitors have grooves scored into the tops which form sort of impromptu blow out panels, and often you will see them bulging, with traces of fluid escaping from these grooves where they are actually splitting open, or scorched fluid forming a red-brown powdery residue outlining them. The grooves are usually in either an X (or Plus) or a sort of K shape. The PSUs are often still working (somewhat) at that point, and often, the PSU may be putting out nominally correct voltages when cool but deviating when it heats up. I had one client's PC that made a loud bang twice over a period of about a week, but the PC didn't really start acting funny until the third bang. Opening the PSU revealed three small caps that had blown completely off the board. It had probably kept running with no obvious symptoms through the first two.
              Of course, only a trained pro with good tools should ever examine the inside of a power supply while live. But, if you are willing to unplug one and take it out of the PC and let it sit overnight, just to make sure the larger capacitors have fully drained, I recommend examining them. Yes, that voids the warranty if you aren't a pro, but if you were going to junk it and buy a new one anyway, so what? But before you open one, read this:

              DON"T EVER OPEN A PLUGGED IN POWER SUPPLY. IF THIS DOESN"T APPLY TO YOU YOU ALREADY HAVE AN ELECTRICIANS LICENCE, A EE DEGREE, OR SIMILAR. DON"T OPEN A POWER SUPPLY UNLESS YOU KNOW THE LARGE CAPACITORS INSIDE ARE DISCHARGED - THEY CAN MAKE YOUR ARM MUSCLES CONTRACT HARD ENOUGH TO BREAK YOUR BONES. GIVE THEM AT LEAST AN HOUR TO RUN DOWN, THEN USE AN INSULATED TOOL TO CROSS THE PLUG PRONGS BEFORE YOU OPEN THE CASE.

              Split caps or scorched ones will confirm you are right in your guess that it's the PSU. While you're at it, if you think the problem is the motherboard, check for capacitor damage there too, as it's not all that uncommon for that to be why a mainboard fails. Cheap electrolytics are probably responsible for more than half of all consumer electronics failures, they are by far the most likely source of intermittent failures, ones that come and go with temperature, or glitches that only partly disable something, and they are detectable.

      --
      Who is John Cabal?
    4. Re:Eurosoft PC Check by rickb928 · · Score: 2, Informative

      Don't trust the caps with the 'X' pattern. The 'K' pattern is more reliable.

      Ask any of the many who had Dell machines from about 2000-2004. And HP/Compaq. And Acer. Not so much IBM/Lenovo. I have no reports for Gateway.

      Also affected ASUS, MSI, AOpen, Gigabyte motherboards, pretty much all brands.

      For a period of time, there werw substandard caps being used, but the maker either faked the testing or used different component parts in production runs than in certification. If you got stung by these, you and I were the QA.

      It was not pretty.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    5. Re:Eurosoft PC Check by Cylix · · Score: 2, Informative

      YOU SHOULD NEVER USE CAPS LIKE THIS AND NEVER SUGGEST SOMEONE BRIDGE COMPONENTS WITH A SCREW DRIVER.

      I'm getting a bit tired of replying to all of the bad advice I see flying around. However, never discharge caps by bridge the connectors (even if the tool is insulated). A large enough power source can cause some serious problems.

      The proper way to handle this is to terminate the load into a ground source capable of dissipating the load. Earth ground will suffice, but don't dump a crap ton of current into the ground of your house.

      --
      "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
    6. Re:Eurosoft PC Check by Quatermass · · Score: 2, Informative

      Eh?

      The main capacitors (usually around 600-1000uF) that smooth the output of the rectified Mains is only about 300-400V and if designed correctly will have discharge resistors across them to render them safe in milliseconds.

      See
      http://pavouk.org/hw/en_atxps.html
      or
      http://www.smpspowersupply.com/ATX_power_supply_schematic.pdf

      for examples.

      Please do not alarm people needlessly.

      --
      Stuart http://stuarthalliday.com/
    7. Re:Eurosoft PC Check by mea_culpa · · Score: 2, Informative

      You can use a resistor to drain the caps safely.
      This is the preferred method as shorting them with bare metal can cause damage to the cap especially if it is highly charged.
      This is ELE-100 stuff here.
      Take a 25K 10W resistor, hold it with a pair of insulated pliers and short the leads of the capacitor with the resistor for about 30 seconds. Verify that it is actually drained by measuring it with your DMM. Repeat if necessary.

  2. Just replace it. by lukas84 · · Score: 2, Informative

    Repairing hardware makes no sense anymore. Just swap in a new machine from the pool, so the user will be happy again, call the manufacturer to send someone onsite to replace the system board, redeploy the image, and put the machine back into the pool.

    At home, i usually replace the machine before it has a chance to get old and flaky.

  3. Preventative Medicine - get a UPS by jackchance · · Score: 4, Informative

    Most home computer hardware failures come from "brownouts".

    If you notice that your lights dim a little bit when your fridge compressor or AirCon comes on, that is a recipe for a computer failure. Spend $50 get a UPS
    Btw, i noticed that my linksys wifi router was also extremely sensitive to brownouts. It would get funked up and need to be power cycled. Plug it into a UPS , no more wifi problems either.

    I learned this the hard way when i moved to an old building in the east village of NYC and had 3 motherboards/cpu fail within a 3 month period.

    --
    1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
    1. Re:Preventative Medicine - get a UPS by The+Grim+Reefer2 · · Score: 3, Informative

      Most home computer hardware failures come from "brownouts".

      If you notice that your lights dim a little bit when your fridge compressor or AirCon comes on, that is a recipe for a computer failure. Spend $50 get a UPS
      Btw, i noticed that my linksys wifi router was also extremely sensitive to brownouts. It would get funked up and need to be power cycled. Plug it into a UPS , no more wifi problems either.

      I learned this the hard way when i moved to an old building in the east village of NYC and had 3 motherboards/cpu fail within a 3 month period.

      What you really need in the case you describe is a good line conditioner. I didn't look at the 'UPS' you mentioned, but many in that price range are not a true UPS and will still allow for under voltage to occur, albeit for a shorter period if you're lucky. .

    2. Re:Preventative Medicine - get a UPS by Anonymous Coward · · Score: 1, Informative

      Why? Doesn't the computer's PSU have enough juice in it to survive a quick dip in voltage?

      No. Off-the-shelf computers from the big vendors tend to select the cheapest, lowest-rating power supply they can find. And since it's the cheapest the power supply vendor may additionally cut corners. A *good* power supply? A little brownout is no problem. Most PCs do not have a good power supply.

      I'm still waiting for my decade-old P3 to die so it can be replaced by an Atom board, but the darn thing keeps on running.

      From my experience at a computer surplus, P3s and below have been very reliable. VERY reliable. Higher-end (2.8ghz+ or so) P4s have exibited increasing rates of blown motherboard caps and power supply failures, and the Pentium Ds we are starting to get have had VERY high failure rates.

                Anyway, my burn-in method won't help you. But at the University surplus I work at, we have an automated netboot Ubuntu installer. We *could* basically ghost it on, but the net installer works out the ethernet, hard disk, CPU and RAM pretty hard -- it has actually found many machines (out of ~10,000 a year we get through) that have no apparent blown caps (*cough* *GX270* *cough*) but are nevertheless unstable. This does not help narrow down the fault, but it does narrow down if it's real or if it IS windows and/or drivers though.

                  Power supply -- check the BIOS, and if it doesn't have one, you'll have to have a voltmeter with you. I've seen power supplys where the voltage sags, it'll run but crash randomly. In reality, I have not checked the power supply very frequently, the below detects most faults.
                  CPUburn -- this'll exercise the CPU.
                  Memtest86 -- memory
                  if it doesn't crash with these going, then run something video-card-intensive. If it then crashes it could be the card unstable, mobo unstable (either not supplying the card enough voltage, or other problem...), or faulty power supply (sagging under load of the video card perhaps? You did test this right?) Or it could be drivers of course.

    3. Re:Preventative Medicine - get a UPS by Anonymous Coward · · Score: 1, Informative

      Your observation is correct: Modern switching power supplies don't care much about voltage, as long as it's in a certain range: They simply draw more current when the voltage is lower. It's not the brown-out as such which kills the computer but the transients which go with it. Power supply quality varies. Good power supplies can bridge longer drop-outs and withstand stronger voltage spikes than others. It speaks for a PSU when the computer keeps running while a short drop out turns the TV off, for example. A PSU like that most likely doesn't need a UPS to protect it from bad mains.

  4. Tools by Anonymous Coward · · Score: 1, Informative

    CPU:
    Prime95 (Step 2): http://www.mersenne.org/freesoft/#newusers ... Blend test for memory+CPU stability, Small FFT for CPU
    Lynx: http://www.softpedia.com/get/System/Benchmarks/LinX-benchmark.shtml

    Video Card:
    3dmark: http://www.futuremark.com/benchmarks/

    When testing the video card, listen for high pitch squealing (power issue), over heating, and symptoms like white dots appearing at random. This is not a test tool but will put some stress on the card.

    1. Re:Tools by lukas84 · · Score: 2, Informative

      Furmark http://www.ozone3d.net/benchmarks/fur/

      Is better suited for stressing your GPU, it's also free.

  5. Microscope by grapeape · · Score: 3, Informative

    I like the Microscope products...their newest version Microscope duo boots off of a USB stick. For machines that dont boot at all they also have a diagnostic card, its basically a pci card that has an led readout that give a series of post codes that can help diagnose if its the board, a card, memory, etc. They can be found at http://www.micro2000.com/

    The handiest piece of diagnostic gear I use is actually a simple power supply tester. You would be amazed how many systems that appear to power up are actually suffering from a dead -5 or +5 rail on the powersupply. Many tend to think if the fans spinning the powersupply is ok but thats often not the case. The best part is they are cheap...around $10 for a basic one.

  6. Hardware tester by iammani · · Score: 2, Informative

    When you no longer trust your CPU/motherboard, I am afraid the only option to test them would be a hardware circuit (which can make decisions using its own CPU) specifically designed for your motherboard/processor. Which I believe only manufacturer will have access to. If you are looking for a more practical solution. The only way is to eliminate the possibility of all other hardware failing (by simply removing them or using them on a good machine) and assuming it must be CPU/motherboard issue(which means you may have to junk them both and buy new ones). And dont forget to test you power supply unit (not checking it on my old PC cost me hell a lot of hours)

  7. SMART for dying hard drives by Wrath0fb0b · · Score: 4, Informative

    http://sourceforge.net/apps/trac/smartmontools/wiki is great for finding out what the drives think about their own health. Things to look out for are spin-retry counts (which lead to that annoying 2-5 seconds freeze), high reallocated sector counts (never never never use chkdsk to attempt to fix a broken hard drive. With the robustness of modern journaling file systems (HFS, extN, NTFS), storage errors are almost always hardware errors. Running chkdsk stresses the drive just as it's failing and usually pushes it over the edge -- and then users complain that you can't recover their data.

    1. Re:SMART for dying hard drives by Anonymous Coward · · Score: 1, Informative

      This research paper says that modern journaling file systems are not as robust as you might think: http://pages.cs.wisc.edu/~crubio/includes/pldi09.pdf

  8. Re:OCCT by PFAK · · Score: 3, Informative

    Did you actually install it? (or are you a typical /. reader?) It has a "GPU" option for stress testing your graphics card if you have the latest DirectX updates installed.

    --

    Free means no restrictions, ironic the FSF's GPL forces restrictions, isn't it? What's your definition of free?
  9. Overheat by gd2shoe · · Score: 4, Informative

    That's a marginal idea at best, but a common one.

    While the technique of blasting a processing unit to see how it behaves at maximum temperature will sometimes find a faulty unit, many faults are not temperature related, and will not show up on this test. It's fine that you brought it up here, but something that both heats the CPU/GPU and tries to test as many pathways / as much of the instruction set as possible would be far more useful. (cf memtest86+ for RAM)

    --
    I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
  10. PSU by gd2shoe · · Score: 5, Informative

    Oh, and don't forget to check the PSU. When it acts up, it will often appear to be a hardware fault somewhere else in the machine. (often RAM, but can be MB, CPU, GPU...)

    This certainly doesn't answer the posters question, but it is related and important.

    --
    I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
    1. Re:PSU by Daneurysm · · Score: 4, Informative

      I was just about to mention this. I used to work in a mom-n-pop shop, the only one in the area, for a long time.

      I have seen some of the most ridiculous problems that were PSU related. Serial mouse not working, VGA card outputting in B&W, slow and or intermittent performance, HD's that constantly reset (and sound like click of death in the process), new memory being blown, known good memory acting like bad memory, CD-R's that can't burn (or finish burning successfully), software modems that couldn't go off hook, AGP cards crashing, PCI cards crashing, VLB SCSI cards not working at all.

      The list really just goes on and on and on. Software to diagnose faulty PC hardware? Sorry, no thanks. I had tried all manner of diagnostic and test software over the years. Some worked some of the time. (mem tests and HD scanners), the rest were borderline use-less pieces of crap. Not only that, but because of faulty PSU's (usually overloaded, or just old, or overheating, etc etc etc) I have seen those same programs misdiagnose just about everything.

      Aside from simple sensor reading and verification (of code, built in HW diagnostics, etc) I do no trust 'software based' hardware diagnosis, especially on a PC.

      YMMV.

  11. Replace the integrated part by gd2shoe · · Score: 2, Informative

    Integrated devices can typically be replaced with PCI/PCIe devices. If an integrated network or sound card gives out, it can often be easier and less expensive to shove a new device into the case and disable the old one in the Device Manager. Still, integrated devices don't go out that often. It's more common for the MB itself to go (my experience, anecdotal).

    --
    I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
  12. Swap the damn hardware by evilviper · · Score: 3, Informative

    but how do you go about testing CPU, motherboard and graphics card trio to find which is to blame? Replacing them one by one isn't really an option. Do you know any software that would help the way memtest helps with RAM?

    There is no way to tell, with software, whether your PSU, CPU, or motherboard is to blame, in the overwhelming majority of cases.

    It's just idiotic to say "Replacing them one by one isn't really an option". In fact, that's by far the best option. I don't run memtest for a week to find out I have bad RAM, I take 30 seconds to swap it, and find out, for certain, in no time. PSUs are equally easy to swap, AND are the more likely component to fail, so that's the best place to start.

    If you don't know whether it's CPU or the MoBo, buy a new motherboard... Vastly more likely to be the cause, and pretty damn cheap just as soon as they're no longer brand new. Of course CPUs fail, but it's likely to be obvious from a visual inspection if they've been installed wrong, or otherwise abused.

    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  13. Re:OCCT by adamstew · · Score: 2, Informative

    many people overclock their GPUs too, so it would make sense that a tool for Overclocking stability tool would stress that as well.

  14. Re:OCCT by Narpak · · Score: 2, Informative

    More than once I have experienced that the on-board sound chip from Realtek causes the computer crash or have significant slowdowns. Disabling and putting in a budget soundcard fixed it. So I would suggest that disabling various on-board components in turn could uncover the culprit. That being said, identifying hardware problems have always, for me, been a bit hit and/or miss.

  15. Re:OCCT by JMandingo · · Score: 2, Informative

    Use a can of compressed air to purge out any accumulated dust. Less dust means a cooler box, which may just bring the unit back within whatever temperature (or, by extension, power) tolerance it is pushing the envelope on. Another technique is to wiggle every cable and connector and slotted card, just to make sure nothing has come loose. Check to make sure all the fans are running whilst powered on.

    --
    Vonnegut was right: Of all the words of mice and men, the saddest are, "It might have been."
  16. You don't, you swap out hardware by GuyFawkes · · Score: 2, Informative
    Of course with time you get experience, dry joints tend to follow power tracks on a PCB, and by gently flexing you can hear them tick.

    Swapping out is the ONLY way.

    I have systems with intermittent (heat activated) dry joints on a mobo, partly duff RAM, and partly duff (rebranded at higher clock) CPU. ONLY swapping out will find it.

    HTH etc

    --
    http://slashdot.org/~GuyFawkes/journal
  17. Only a couple tools needed. by bwave · · Score: 3, Informative

    We have repaired about in excess of 50,000 machines, and I'll tell you the tools needed are very simple. The process we do is, open the machine, dust with air compressor (with humidity drier, you can pickup at sears a 4gal with drier for about $99, saves alot of money on $3-6 cans of air) and central vacuum system (a shopvac will work), then inspect the motherboard & video card for blown caps. Take off the cpu fan and inspect the compound, if it is home built, lord only knows what you'll find. Test the power supply with a digital power supply tester (one of the $12 lcd ones) if good, still open the power supply, look for blown caps. (many will have blown caps, and be causing sporadic problems the simplistic tester will not). See if machine will power on / boot. If it doesn't power on, or hangs on post, remove modem and nic if it's a seperate card, when these are blown by lightning will cause no post. Ensure the hard drive is mounted properly with 4 screws installed, less than that the vibrations will cause the drive to go bad. (don't care what operating specifications you show me, or what G-rating the drive has, this is the case) Then test memory with Memtest86+ 1.70, and the hard drive with one of the 3 versions of Seatools by seagate. (some versions will lock on some video/chipsets, if you get a long string of bad sectors on a hdd bigger than 320gb, that begin about 2/3rds way through drive, test with a different version to be sure, as there is a sector count issue with some large hds) The 3 versions are an older GUI one, the newest GUI one, and the text version. If you have even 1 bad sector - replace the drive. We do the above process on EVERY machine before we attempt to do anything else, it is well worth the couple hours it takes to do. If you make it this far, than 99% of the time, you're problem is malware/viruses. Run Combofix, look for files not removed by it, boot with Ultimate Boot CD (the WinPE based one) or something like Knoppix and manually remove them. Search the WIndows, Windows/System32, Windows/System32/Drivers directories for files created in the past month, anything suspcious is probably a malware. Rename those files. Look under Program Files, Program Files/Common, ProgramData, and Users/UserName/ApplicationData for suspicious directories and rename/delete, these are where your AlphaAntivirus, Windows Police Pro, UltimateAV, etc, like to hide. Boot back into windows, run Hi-Jack This!, remove any suspicious entries, reboot, anything left? If so, remove manually with bootcd. In add/remove programs, remove all unneccessary programs. Then run CWShredder, Malwarebytes Antimalware, Spybot, and AVG Antivirus. (Feel free to substitute legimate antimalware/antivirus tools in place of these 3, but we find these 3 work best for us. Install all Windows updates, update all sytem drivers, try browsing the internet for 2 or 3 minutes. If all seems ok, reboot one last time, and be sure you can browse the inet still. All done! This fixes pretty much everything. Other than specific issue your customer may have complained about. Also, be sure to check the amount of ram here are what we recommend, otherwise, with latest service packs, etc. machine will seem sluggish. Windows 95 - 96mb+, Windows 98/ME - 196mb+, Windows 2000 384mb+, Windows XP 640mb+, Windows Vista Home Basic 1Gb+, Windows Home Premium 2Gb+, Windows Vista Ultimate/Windows 7 4Gb+ If you don't give machine back with this amount of ram, your customer will swear machine is slower than when the brought to you, doesn't matter how untrue it is, doesn't matter how much malware you removed or how machine didn't even go into windows! CPUs/Video Card rarely go bad unless abused. Normally, your find a under-rated power supply, or defective power supply to blame. Also, if you're working with a notebook, be sure to dust the exhaust/intake vents, if still power down/lockups, you need to disassemble and recompound cpu/video chipset with Arctic Silver 5. The other thing is power problems, mouse lockups, etc many times are caused by bad batteries, try running w/o a batter installed, just ac adapter. Any battery older than 2 1/2 years old is suspect. And of course, look for broken dc power jacks.

  18. Re:OCCT by kav2k · · Score: 2, Informative

    You can add Furmark to your "toolbox" to stress-test your GPU, free, built almost for that specific purpose and effective.