Best Linux Hardware Diagnostics?
An anonymous reader asks: "I've been running Linux for a little while and usually hardware problems have shown up quite easily - kernel panic, no module, no networking, etc. - but recently I've encountered some problems with network disk access causing very high load, which I think might be hardware related. Under Windows I'd fire up SANDRA or the like and run a full system scan. I did a quick search and nothing really stood out. I was wondering if any Linux gurus out there would like to share their expertise on Linux diagnostics?"
Well, /var/log/messages is a good place to start. Check your nfs/smb logs, and if all else fails, use a kernel debugger.
I think this is what you are asking, if not please clarify.
There isn't really a "suite" that I know of like SANDRA;however, using your system logger (sysklogd, metalog etc) you can get a real good view of what is going on with your system. You may want to enable some debugging settings in kconfig and recompile the kernel so you get more info in the log. If you have any programming experience you can try profiling the kernel and any problems you can attempt to correct or post to the LKML. But really it sounds to me like your in need of a distro change. Try debian or gentoo.
I thought the article was titled "Best Linux Distrobution" when my eyes passed by it, wouldn't that have been a fun discussion.
lshw (-X gives a basic gui), lspci -vvv and lsusb give a lot of detailed information about your setup.
/proc/cpuinfo gives some good CPU info. Try cat /proc/ a few others too.
Also cat
Both should display kernel messages from boot-up. Kernel boot messages usually contain the information you need to track down IRQ conflicts.
MemTest86
Not really a Linux program, but something I usually stick as a boot option in grub. Does a great job at detecting bad Ram. MemTest86 can also be booted from a floppy.
BadBlocks
This utility can be used to find bad blocks on a disk partition. I've used it before to check disks.
You might also want to check out some system monitoring utility like Gkrealm, since that gives a generally complete picture IRQ/Interupt usage, Bandwidth utilization, memory and cpu utilization.
I don't think I've ever had a hardware problem that couldn't be diagnosed using the aforementioned utilities.
We keep knoppix CDs just for this purpose; hardware diagnostics. dmesg and the /var/log/messages provide information that is otherwise hard to obtain from Windows 2000 or XP, especially if you cant boot the windows.
Another crucial thing is lspci, which is absent from windows. Say you do a fresh install of windows, which does not detect the network card. How do you know what card is it to obtain the drivers for? In windows you just cant so easily get the PCI information. Enter knoppix.
I have also used memtest in knoppix and found memory issues before, where windows simply acted up. The problem with windows is you have to boot the entire OS and take ~130MB of Ram and resolve all IRQs before you can run Sandra or the likes. Memory issues, disk issues or IRQ issues will prevent you from booting even.
Knoppix when booted in single-user mode takes little memory, and you can boot it not to use ACPI, not to use HLT instruction, not to detect SCSI that might freeze the system etc. Then you can diagnose the system. Just get a CD and read the man pages of various tools on the CD.
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
if network access is causing "load" and not "cpu usage", you need to look at kernel stuff - drivers, TCP/UDP windows, ethernet statistics, etc.
I want to delete my account but Slashdot doesn't allow it.
lspci /proc/cpuinfo /proc/scsi/scsi /dev (if using udev)
cat
lsusb
cat
ls
dmesg|less (or more depending on your PAGER)
free
These usually are enough to determine if BIOS thinks your hardware exists. And also this should help determine if the kernel has loaded a driver and given a device node to your hardware. If you need to know if a harddrive is bad (or partition) you can use the old standby:
dd if=/dev/ of=/dev/null
That will tell you if you can read all the data on the device or not. Hope that helps.
Just to add this to the suggested list of applications: smartmontools control and monitor storage systems using S.M.A.R.T. lmbench Utilities to benchmark UNIX systems memtest86 Test your memory on x86 platforms nictools-nopci Diagnostic tools for many non-PCI ethernet cards nictools-pci Diagnostic tools for many PCI ethernet cards lm-sensors utilities to read temperature/voltage/fan sensors mbmon Hardware monitoring without kernel dependencies (text client) sensord hardware sensor information logging daemon crashme Stress tests operating system stability fuzz stress-test programs by giving them random input spew I/O performance measurement and load generation tool stress A tool to impose load on and stress test a computer system cpuburn a collection of programs to put heavy load on CPU ltp The Linux Test Project test suite
assert(expired(knowledge));
The Ultimate Boot CD: It's basically a compilation of different boot disks, all put in a nice menu system on a freely-downloadable ISO image. While it's not really Linux (though it contains a number of Linux-based boot disks), it is one of the best utility CD's that I've ever encountered for testing hardware.
;)
Also, Knoppix is another one that I would suggest, though I use it more for data recovery these days.
What is the big deal about SiSoft Sandra? It's not even that amazingly useful for benchmarking let alone diagnostics. Everybody in the Windows world raves about it but I fail to see the attraction. It's vaguely useful for system inventory/information but thats about it.
Eurosoft do a product called PC-Check. It's not cheap (£150) but it works very well. You get a bootable floppy (which you can copy) that tests just about everything in your PC.
Best of all, you just slap it in a machine, let it run for an hour, and come back to see the results.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
If it's an hdd bottle neck then check/set your hdd settings with "hdparm" and monitor the performance with "iostat".
less /var/log/debug /var/log/dmesg or dmesg | less
/proc
less
Varioues files under
I prefer less as it gives more options such as MOVING , SEARCHING etc
Also you can write your own custom script to digout information not just from one linux server but from other Linux/BSD servers and email/page back the results.
The important thing is not to stop questioning --Albert Einstein.
Is there some program similar to 'top' which shows which process is doing the most I/O? Sometimes I just hear a lot of seeking on my hard drive and would like to have a way to find out which process is actually causing it.
I remember having to dig into the registry entries to get PCI IDs of devices, then looking them up on sourceforge. But those days, astonishingly, are past. Windows XP's Device Manager has a nice bit where it has a "Details" tab, with "Hardware IDs". For instance, the 3C905-TX in this computer reads as PCI\VEN_10B7&DEV_9200&SUBSYS_100010B7&REV_6C
. And yes, it does this for unknown devices as well. So no more registry digging.
The problem with their "recovery mode" being seriously weaker than the equivalent linux console boot... well, that's a whole 'nother issue.
--grendel drago
Laws do not persuade just because they threaten. --Seneca