Software To Diagnose Faulty PC Hardware?
Etylowy writes "Over the years I have repaired my own PC and those belonging to family and friends many, many times. While in most cases it turned out to be restoring a system after malware/the user/Windows made a mess, or simple cases of 'follow the smell of smoke and molten plastic,' there were some nasty ones where the computer mostly works. By 'mostly,' I mean: you can boot it up, it might even work for a while, but will crash way too often to blame it all on Microsoft — what do you do then? Once you strip it of any extra hardware (which, with today's motherboards that have pretty much everything integrated, might not be an option) you are left with the CPU, motherboard, graphics card, RAM and HDD. You can test the HDD, you can run memtest86+ to check the RAM, but how do you go about testing the CPU, motherboard and graphics card trio to find which is to blame? Replacing them one by one isn't really an option. Do you know of any software that would help the way memtest helps with RAM?"
It will stress your RAM, CPU, and GPU or all at once with pretty temperature and utilization graphs (for Windows only): http://www.ocbase.com/perestroika_en/
Free means no restrictions, ironic the FSF's GPL forces restrictions, isn't it? What's your definition of free?
This is probably one of the best and most comprehensive OS agnostic boot-CD/floppy general purpose PC hardware testing and burn-in tools I've come across IMHO.
Here's its web page : http://www.eurosoft-uk.com/pc_check.htm
In any case, I recommend plugging the ATX cable into a power supply tester that presents a non-trivial load as a first step in diagnosing any PC. You'd be surprised in what ways the problems caused by out-of-spec voltages can be manifested.
jdb2
self-checking programs like Prime95 can be useful to test the computer more generally (if you've verified with memtest a failure here basically means cpu/chipset at fault).
Other things I've tried before have been (if the motherboard allows) things like significantly underclocking sections of the motherboard/processor, if an specific underclock fixes the problem you just significatnly narrowed down the list of possible failures.
there are similar programs to memtest that will check a GPUs output conforms to what it should, but if you just have random-crashy-badness that can be a pain to diagnose. Sometimes things like just running without graphics drivers for a while can help spot those problems, if the computer no longer crashes you can look a bit further away from the graphics card as most of it's capabilities won't be used.
Well... typically you find the fault by using an application which stresses one of those components far more than any other and then seeing if the failure condition you're observing occurs more often. This is just basic troubleshooting, it's not even specific to computers.
#fuckbeta #iamslashdot #dicemustdie
Most home computer hardware failures come from "brownouts".
If you notice that your lights dim a little bit when your fridge compressor or AirCon comes on, that is a recipe for a computer failure. Spend $50 get a UPS
Btw, i noticed that my linksys wifi router was also extremely sensitive to brownouts. It would get funked up and need to be power cycled. Plug it into a UPS , no more wifi problems either.
I learned this the hard way when i moved to an old building in the east village of NYC and had 3 motherboards/cpu fail within a 3 month period.
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
I like the Microscope products...their newest version Microscope duo boots off of a USB stick. For machines that dont boot at all they also have a diagnostic card, its basically a pci card that has an led readout that give a series of post codes that can help diagnose if its the board, a card, memory, etc. They can be found at http://www.micro2000.com/
The handiest piece of diagnostic gear I use is actually a simple power supply tester. You would be amazed how many systems that appear to power up are actually suffering from a dead -5 or +5 rail on the powersupply. Many tend to think if the fans spinning the powersupply is ok but thats often not the case. The best part is they are cheap...around $10 for a basic one.
Hiren's BootCD contains a bunch of different utilities for doing just this. Plus it's bootable, so if you can't get into the OS you can still use the CD. It can do just about anything you'd need to in order to diagnose and repair a machine. You just gotta find it (usually the pirate bay or other torrent sites are a good place to look.)
http://sourceforge.net/apps/trac/smartmontools/wiki is great for finding out what the drives think about their own health. Things to look out for are spin-retry counts (which lead to that annoying 2-5 seconds freeze), high reallocated sector counts (never never never use chkdsk to attempt to fix a broken hard drive. With the robustness of modern journaling file systems (HFS, extN, NTFS), storage errors are almost always hardware errors. Running chkdsk stresses the drive just as it's failing and usually pushes it over the edge -- and then users complain that you can't recover their data.
That's a marginal idea at best, but a common one.
While the technique of blasting a processing unit to see how it behaves at maximum temperature will sometimes find a faulty unit, many faults are not temperature related, and will not show up on this test. It's fine that you brought it up here, but something that both heats the CPU/GPU and tries to test as many pathways / as much of the instruction set as possible would be far more useful. (cf memtest86+ for RAM)
I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
Oh, and don't forget to check the PSU. When it acts up, it will often appear to be a hardware fault somewhere else in the machine. (often RAM, but can be MB, CPU, GPU...)
This certainly doesn't answer the posters question, but it is related and important.
I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
There is no way to tell, with software, whether your PSU, CPU, or motherboard is to blame, in the overwhelming majority of cases.
It's just idiotic to say "Replacing them one by one isn't really an option". In fact, that's by far the best option. I don't run memtest for a week to find out I have bad RAM, I take 30 seconds to swap it, and find out, for certain, in no time. PSUs are equally easy to swap, AND are the more likely component to fail, so that's the best place to start.
If you don't know whether it's CPU or the MoBo, buy a new motherboard... Vastly more likely to be the cause, and pretty damn cheap just as soon as they're no longer brand new. Of course CPUs fail, but it's likely to be obvious from a visual inspection if they've been installed wrong, or otherwise abused.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
I stress my Linux boxes by telling them that if they develop a fault I'll re-image them with Vista.
Not a single one has dared to fail on me yet.
AT&ROFLMAO
Even when they do, it's usually a sign the rest of the board is on it's way out too. A device on the board not functioning can mean a number of things (MB controllers acting up, visible/non-visible corrosion in the board, blown capacitors, etc), so you can be up for a lot of weird behaviour from the board that you can't pin down.
To be honest, relying purely on a test suite to tell you what's broken will lead to disaster. Only through experience do you get the pointers toward what is actually faulty. Add to this that true diagnosis only comes from swapping out parts, and, well, test suites don't look at all like a viable option.
When I am repairing hardware about the only suite I use is memtest86+ and a decent live linux distro. You can usually pick devices that have failed with lspci, however this is not always correct. It all goes back to having test hardware & the knowledge of what certain behaviours in systems are caused by certain faults. After 15 years of working in IT with both hardware & software faults, there's only so much you can do with limited or no test hardware. Most of the time when you're diagnosing hardware faults on the phone it's an educated guess at best, the only time you truly get a decent diagnosis is when you have the machine with you and can swap parts out. Hell, we don't even use the Dell diagnostics at work due to their inability to give decent results on anything other than RAM.
With hardware its usually bad psu, then bad memory, then bad caps.
Then bad karma, then bad mojo.
"City hall" in German is "Rathaus" Kinda explains a few things......
We have repaired about in excess of 50,000 machines, and I'll tell you the tools needed are very simple. The process we do is, open the machine, dust with air compressor (with humidity drier, you can pickup at sears a 4gal with drier for about $99, saves alot of money on $3-6 cans of air) and central vacuum system (a shopvac will work), then inspect the motherboard & video card for blown caps. Take off the cpu fan and inspect the compound, if it is home built, lord only knows what you'll find. Test the power supply with a digital power supply tester (one of the $12 lcd ones) if good, still open the power supply, look for blown caps. (many will have blown caps, and be causing sporadic problems the simplistic tester will not). See if machine will power on / boot. If it doesn't power on, or hangs on post, remove modem and nic if it's a seperate card, when these are blown by lightning will cause no post. Ensure the hard drive is mounted properly with 4 screws installed, less than that the vibrations will cause the drive to go bad. (don't care what operating specifications you show me, or what G-rating the drive has, this is the case) Then test memory with Memtest86+ 1.70, and the hard drive with one of the 3 versions of Seatools by seagate. (some versions will lock on some video/chipsets, if you get a long string of bad sectors on a hdd bigger than 320gb, that begin about 2/3rds way through drive, test with a different version to be sure, as there is a sector count issue with some large hds) The 3 versions are an older GUI one, the newest GUI one, and the text version. If you have even 1 bad sector - replace the drive. We do the above process on EVERY machine before we attempt to do anything else, it is well worth the couple hours it takes to do. If you make it this far, than 99% of the time, you're problem is malware/viruses. Run Combofix, look for files not removed by it, boot with Ultimate Boot CD (the WinPE based one) or something like Knoppix and manually remove them. Search the WIndows, Windows/System32, Windows/System32/Drivers directories for files created in the past month, anything suspcious is probably a malware. Rename those files. Look under Program Files, Program Files/Common, ProgramData, and Users/UserName/ApplicationData for suspicious directories and rename/delete, these are where your AlphaAntivirus, Windows Police Pro, UltimateAV, etc, like to hide. Boot back into windows, run Hi-Jack This!, remove any suspicious entries, reboot, anything left? If so, remove manually with bootcd. In add/remove programs, remove all unneccessary programs. Then run CWShredder, Malwarebytes Antimalware, Spybot, and AVG Antivirus. (Feel free to substitute legimate antimalware/antivirus tools in place of these 3, but we find these 3 work best for us. Install all Windows updates, update all sytem drivers, try browsing the internet for 2 or 3 minutes. If all seems ok, reboot one last time, and be sure you can browse the inet still. All done! This fixes pretty much everything. Other than specific issue your customer may have complained about. Also, be sure to check the amount of ram here are what we recommend, otherwise, with latest service packs, etc. machine will seem sluggish. Windows 95 - 96mb+, Windows 98/ME - 196mb+, Windows 2000 384mb+, Windows XP 640mb+, Windows Vista Home Basic 1Gb+, Windows Home Premium 2Gb+, Windows Vista Ultimate/Windows 7 4Gb+ If you don't give machine back with this amount of ram, your customer will swear machine is slower than when the brought to you, doesn't matter how untrue it is, doesn't matter how much malware you removed or how machine didn't even go into windows! CPUs/Video Card rarely go bad unless abused. Normally, your find a under-rated power supply, or defective power supply to blame. Also, if you're working with a notebook, be sure to dust the exhaust/intake vents, if still power down/lockups, you need to disassemble and recompound cpu/video chipset with Arctic Silver 5. The other thing is power problems, mouse lockups, etc many times are caused by bad batteries, try running w/o a batter installed, just ac adapter. Any battery older than 2 1/2 years old is suspect. And of course, look for broken dc power jacks.