Slashdot Mirror


Lessons In Hardware / OS Troubleshooting

Esther Schindler writes "We like to imagine that every Microsoft OS installation will work just as well as the company promises. When things don't work out, identifying and remedying the case of failure can be time-consuming and frustrating. This lesson in how to determine why Windows 7 didn't install may help you troubleshoot a problem of your own, and save you from a Lost Weekend. Maybe you'll find this account useful all on its own. But the real key here is that the author is Ed Tittel — who's written over 100 books. If this hardware geek spends days solving a CPU-meets-Windows 7 problem, what chance do mere mortals have?"

36 of 236 comments (clear)

  1. Sooooo by Anonymous Coward · · Score: 4, Insightful

    He has issues with an "unsupported and unwarranted engineering sample CPU from Intel" with Windows 7... and Windows 7 is of course to blame according to the OP.... *roll eyes*

    1. Re:Sooooo by SilverHatHacker · · Score: 5, Insightful

      This is Slashdot. Windows is always to blame.

      --
      Funny may not give karma, but +5 Informative never made anyone snort coffee out their nose.
    2. Re:Sooooo by Ralish · · Score: 3, Informative

      I also found it bizarre that at no point did he seem to think of checking the setup logs. Admittedly, it probably wouldn't have helped him in this case, as logs often don't reveal anything in the case of intermittent hardware failure, but really, if I have a problem with setup, the first thing I'd think to check would be the log files in case they turn up something interesting. That's, you know, kind of why they're there...

    3. Re:Sooooo by lorenlal · · Score: 4, Insightful

      Actually, let's recap the action and the missteps:
      Inconsistent failure point during the initial installs. Yes, it could've been a problem with the ISO or the media. He correctly tried re-applying the image and also tested on another machine.

      At that point, you don't replace the motherboard. You might as well replace everything else first... Start with slapping the HD into the machine that worked and try the install again. When that worked, that would've reduced the potential culprits to the memory, CPU, and then lastly the mainboard. Memtest would've found no memory issue (which would also indicate that the mainboard is also less likely a problem), so that's when the CPU switch should've happened... Especially since it was "an engineering sample."

      Writing 100 books does not an expert make. Of course, I'll grant the guy some slack. Even the best of us have an experience where we throw our better judgment out the window. We make mistakes, or just totally forget how this is supposed to work, get into a panic, and goodness knows what else.

      The difference, and where I think this guy made the big mistake? When he decided to post this experience. Would've been much better just writing it like this:

      "I tried to go from x86 to x64, and it failed. I troubleshot it like a noob. I'll do better next time."

  2. actually by nomadic · · Score: 5, Funny

    We like to imagine that every Microsoft OS installation will work just as well as the company promises.

    Actually around here people like to imagine that every MS OS installation will miserably crash, because then they strut around feeling good about using Linux.

    1. Re:actually by Hatta · · Score: 4, Insightful

      I've installed Windows 7 on my home PC. Played some games on it. I'm impressed. It's at least as stable as XP, and not noticeably slower.

      I still strut around feeling good about using Linux. You don't have to hate one to like the other you know. I wouldn't use Windows every day by choice, only because the command line utilities on Linux are so much more convenient. I like the GUI better too, real virtual desktops, windowshading, the selection buffer, all great. And the repositories are great too.

      So yeah, not everyone who likes linux is prejudiced against Microsoft.

      --
      Give me Classic Slashdot or give me death!
    2. Re:actually by Fallingcow · · Score: 4, Informative

      Do what I do--run Windows, put Linux in a VM. Virtual Box is free, robust, and easy to use, or there's always VMWare.

      Run the VM full screen and you can forget you're not running it natively, so long as you don't need to do anything in 3D or very processor intensive (video encoding, for example). Drop to Windows if you need a Windows app (say, a recent version of Photoshop or real MSOffice) or to play games. Plus, if your chosen distro decides to make horrible decisions that cause massive audio breakage (Ubuntu.... *glower*) you can still listen to music or watch Youtube videos in Windows without rebooting.

      Another plus is that your Linux installation is all in a single file that you can back up or transfer very easily.

      I find that this works far better than dual booting. Saves disk space, saves time. I felt kind of crappy at first for making Linux a second-class citizen on my machine, but this works so much better that I wish I'd done it years ago--though I supposed high clocked multi-core processors and multi-gigabyte RAM sticks weren't commonplace back then, so the experience might not have been so nice.

  3. You're Kidding by nmb3000 · · Score: 5, Informative

    This is front-page news for Slashdot now? Here's the sum total of TFA:

    • Guy tries to install 64-bit Windows 7 on a machine previously running 32-bit Windows 7
    • Install fails over and over again
    • He replaces hardware components with no luck until he swaps out the CPU
    • Windows installs but is unstable
    • Worthless ASUS BIOS automatic "optimizers" cause stability problems (surprise!)
    • With BIOS settings changed to sane values Windows is stable

    Wow, color me impressed!

    How are "mortals" supposed to figure it out? I guess they buy a PC from Dell because everything in that article qualifies as "no duh" for system builders.

    --
    "What do you despise? By this are you truly known." --Princess Irulan, Manual of Muad'Dib
    /)
    1. Re:You're Kidding by lalena · · Score: 4, Insightful

      Exactly. He swapped every piece of hardware - saving the engineering sample CPU as the last thing he swapped. The system ran fine under Win 7 32 bit. You have to assume that hardware still works fine and that the problem was 64bit specific - which points to the CPU. Granted Intel said it should support 64bit, but it was an engineering sample.
      He replaced the case, power supply, the video card, the mother board, the hard drives, and the cables first??

    2. Re:You're Kidding by GrumpySteen · · Score: 5, Funny

      You should be impressed. No mere mortal would ever look at computer and think "let's replace random parts until it starts working!" This guy is clearly some sort of magical god of electronic troubleshooting. Quite possibly with a unicorn for a sidekick.

    3. Re:You're Kidding by Gadget_Guy · · Score: 5, Insightful

      More evidence he's a "script-kiddy": He uses Microsoft's "excellent" Windows 7 USB DVD Download Tool, instead of simply using diskpart to create a partition on the stick and copying the files over from the ISO.

      Yeah, right. He writes books on Windows 7, but he shouldn't try the official way of installing from USB. Because that would mean that he had used the tools that he wanted to write about. Shame on him!

    4. Re:You're Kidding by GrumpySteen · · Score: 5, Funny

      Maybe if he had switched to a 128 bit power supply. That's twice what you need for a 64 bit processor, right?

    5. Re:You're Kidding by rubycodez · · Score: 4, Funny

      I always use gold plugged audiophile cables from my power supplies, they supply robust pure energy for perfect rendering of 64 bit flash multimedia

    6. Re:You're Kidding by Barny · · Score: 3, Informative

      Wow, he has installed 30-plus intel based Windows Seven machines? Since its launch, I have had about 40-50 DOA intel processors, NONE of which were engineering samples.

      Guy is what we, in Australia, would call a "Tosser" who for lack of a better description is "Talking wank".

      No, mere mortals would never be required to sort out this problem, because they would never encounter it.

      Order for troubleshooting random seeming install fails is:

      Install media (the disk, get a known good image/disk)
      Hardware that reads that media (DVD drive, cable)
      Ram
      CPU
      HDD (and cable)
      Board
      PSU

      Most DOA PSU problems are dead PSU, most of the rest are rating selection errors (not powerful enough).

      Also, this list is optimised not only for most likely faults but for parts that are "easy" to replace :)

      --
      ...
      /me sighs
  4. 100 books? by cranesan · · Score: 4, Interesting

    I'm a little suspicious; how much of an expert can you be writing 100 books on a variety of subjects.

    Reminds me of a tech instructor I had who proudly informed the class he teaches oracle classes, mysql classes, sql server classes, cisco classes, juniper classes, .net development classes, php, etc..... Yeah he couldn't answer any basic questions that strayed from the text book in front of us.

  5. Re:What I love here is the part where he by Wrath0fb0b · · Score: 5, Insightful

    just rolls right on past the fact that, if what he was installing was -- oh, say -- a Linux distribution, he wouldn't have an opaque "I'm uncompressing files" thermometer, he'd have real progress status messages, with, y'know, *parameters* and stuff, and -- unlike me this morning with my boss's iPhone -- a hope of actually figuring out what's broken.

    And what specific parameter in any Linux installation error message is likely to point towards the CPU being defective? Most of them would be generic hardware-has-shit-itself errors (DMA failures, null pointer exceptions, hash failures) that could mean any of the cpu/motherboard/ram/psu/hdd are defective. It's impossible, even in principle, for any installer to be able to pinpoint with specificity what hardware is fucked.

    Just for lols, I wish you would get modded up (me too, of course :-P) so that the OP can install $DISTRO on that original setup and see what error we get and whether it exactly pinpoints the cpu or whether it spits out a generic hardware error.

  6. ES CPU by tftp · · Score: 5, Insightful

    If this hardware geek spends days solving a CPU-meets-Windows 7 problem, what chance do mere mortals have?"

    You need first to show me a "mere mortal" who has, and uses, an engineering sample CPU. There is a very good reason why -ES parts are marked as such - because they have bugs. And those bugs will be a problem sooner or later.

    So the whole sob story can be reduced to this. The guy runs software on a prototype hardware, and the software crashes. In other breaking news, dog bites man.

  7. Harware issue? Welcome to Linux by ben_kelley · · Score: 4, Interesting

    If you have never had a hardware issue when installing Linux on a machine you must be very lucky.

    "Most things work fine" people tell me, which is true. The trouble is that the chances of you owning something that doesn't work is relatively high. (There's probably something from my statistics course that explains why that is, but I have so far managed to suppress that memory.)

    After having rebuilt a Mac with OS X, and rebuilt a laptop with Ubuntu 9.04, I was surprised at how smooth and the Ubuntu install was. Of course that was until I wanted to use my webcam with Ubuntu. These kinds of problems get very difficult very fast in Linux. When 9.04 first came out there was a dependency problem that meant that you couldn't easily get some webcams working.

    To be fair, that problem is most likely sorted out now, and a non-Apple webcam would have needed a (very easy to install) driver on OS X as well. The point is, Windows and hardware generally work very well.

    1. Re:Harware issue? Welcome to Linux by jedidiah · · Score: 4, Interesting

      There was another fellow that mentioned the idea of staying away from the top and the bottom.

      Avoid the dregs and the bleeding edge.

      That middle will probably me much more reliable under Windows and more likely to be supported on Linux (or even MacOS).

      No one cares enough about the dregs to support them under Linux or MacOS and the bleeding edge stuff is just too new.

      That approach does pretty well regardless of OS today and did pretty well 16 years ago too.

      The problem with "statistics" is that any give PC isn't really random. It's a reflection of it's owner. It may be a dreg, a poster boy for bleeding edge gamer conspicuous consumption or something that's more moderate.

      "When 9.04 first came out" is covered by this rule actually.

      --
      A Pirate and a Puritan look the same on a balance sheet.
  8. Re:What I love here is the part where he by Kitkoan · · Score: 3, Informative

    And what specific parameter in any Linux installation error message is likely to point towards the CPU being defective? Most of them would be generic hardware-has-shit-itself errors (DMA failures, null pointer exceptions, hash failures) that could mean any of the cpu/motherboard/ram/psu/hdd are defective.

    That would be the P.O.S.T. which your BIOS should be checking.

    --
    Attention... all grammer nazi"s! Is they're anything; wrong with: my post,
  9. Re:Summary. by socceroos · · Score: 3, Funny

    I think it was the 404th time actually.

  10. This happened to me in a production CPU. by UnifiedTechs · · Score: 3, Interesting

    Actually I had the exact same problem a few months ago upgrading a Dell server from Win2003 x86 to Win2008 x64, I suspected the CPU from the beginning, but I spent a few hours before the Dell Tech agreed with me. They sent a replacement and it worked like a champ.

    This proves it has happened to a production Intel Core2Duo CPU at least once, I can't believe I was the only one.

  11. Re:Assigning blame doesn't alway help by PopeRatzo · · Score: 5, Funny

    I didn't read the article or the summary. The title was plenty.

    --
    You are welcome on my lawn.
  12. No kidding by Sycraft-fu · · Score: 3, Insightful

    Dear self important guy who isn't near as good at computers as he thinks he is:

    This may surprise you to learn, but all those defaults out these, all those specified values, all that kind of stuff, that isn't just arbitrary. See many smart engineers and other folks worked on designing and creating all the hardware for your computer. A lot of extremely complex stuff went in to it, modern computers are quite a marvel of engineering. As such, they discovered that certain tolerances, certain ranges work well. Outside of that, there can be problems. Thus the defaults because, well, default. They set them so that things are very likely to work in all cases.

    As with most things, they aren't absolutes. They aren't things you can never exceed. In various circumstances you can go outside those normal ranges, sometimes by a little, sometimes by a lot. However, problems can potentially result. What problems those are and when they happen is not predictable. A system can appear stable but only crash on one app, or it can be stable for awhile then develop an instability.

    Regardless, the first step to troubleshooting should be to USE THE FUCKING DEFAULTS, you idiot!

    Seriously, I'm supposed to take someone seriously who is running overclocked settings of some sort or another (RAM timings, FSB, etc) and an engineering sample CPU and has problems? Ummm, duh. That right there is asking for problems. When you OC, you go in to it knowing you may have some difficulties. You understand this is the tradeoff for something that runs faster than spec. If you start having problems, the first step is to back off the OCing and see if that fixes it.

    This is true even of OC'd systems that were fine but aren't now. I had a Celeron 300 that I OC'd to 450 back in the day and it worked well for about a year, then started to burn out. System started crashing randomly, and so on.

    To me, it sounds like he's being whiny because he didn't bother to troubleshoot his setup properly. Come talk to me when you've got a retail CPU running at stock spec and FSB, RAM running per it's JEDEC spec at standard voltage and so on. Oh, what's that? You did that and it stopped having problems? Well there you go then. Don't bitch that your i7 920 "should" run at 3.8GHz. I don't care if others have done it, doesn't mean it'll work in your case. If it does, wonderful. It if doesn't well tough shit. Don't get mad at the software. It has pretty much no way to know if the CPU is going crazy as it runs on the CPU. About the only way software can indicate a CPU problem is by inducing a problem and thus a crash.

  13. Re:What I love here is the part where he by Jezza · · Score: 4, Interesting

    I temper this approach with the "this is easy as hell and very quick". So even if I think it is something, if there is something else it could be that's really quick to try I'll ignore my "brilliance" and try that. What is amazing is how often there isn't actually one problem, but two. Also helps if you have a similar working system that you can take the components from (so you know that this or that doodah actually works).

  14. So, a 0.5% faillure rate.... by rueger · · Score: 3, Insightful

    The guy does 400+ successful installs, then runs into a decidedly obscure hardware problem, and people flame him? And Windows 7?

    Yee Gods. Get a life folks. I read this as a success story, both for the author and for Microsoft.

  15. It's interesting... by The+Spoonman · · Score: 4, Insightful

    Nowhere in the original article did I get the sense that the author was blaming Windows for his issues. In fact, he starts out by stating that he's installed Windows 7 hundreds of times without a single incident, but this was a "problem PC". So, how did this turn into an anti-Windows rant? Oh, right, it's Slashdot...

    who's written over 100 books

    Michael Behe's written dozens of books trying to debunk evolution. It does not make him an expert in evolution. He installs Windows, copies down what he sees on the screen and writes it down. That does NOT translate into "he knows what he's doing". I'm not saying he's not an expert, just that it's not a valid qualification.

    If this hardware geek spends days solving a CPU-meets-Windows 7 problem, what chance do mere mortals have?"

    They wouldn't be installing an OS. Very few non-geeks do so. They buy a computer from a vendor like Dell, it comes with an OS. When it's time to upgrade, they buy a new PC and give the old one to their kids or grandparents. They also, as has been stated numerous times in the comments, wouldn't be installing on machines that had an engineering sample for a CPU. Actually, this debunks the claim that because he's written books, he's an expert. He knew he had a machine with an unsupported processor in it and still replaced everything in the machine first. Um....duh!

    --
    Which is more painful? Going to work or gouging your eye out with a spoon? Find out!
    http://www.workorspoon.com
  16. My lesson. by w0mprat · · Score: 4, Interesting

    Just last night I fixed my parents computer in one of those long fixes that turns out to be the most fundamentally trivial things. This is why this is not my main occupation.

    Basicly they had a reccently built custom Windows 7 + Ubuntu PC that had begun randomly shutting down, often minutes after it had been powered up.

    Ok first thing, any obvious errors or cicumstances? No, it would just randomly power off. Windows event logs showed kernel power events, no specific driver, service or app crashing anywhere. Linux was the same. Not a thermal issue cpu + gpu temps nominal and stress test din't immediatley cause a crash.

    Suspecting a power or a motherboard issue, first checked and re-seated things internally. It still occured.

    Removed extraneous cards, connectors and drives. No result. It would even happen sitting in BIOS setup. Have ruled out a number of problems.

    Checked for electrical shorts, poor voltage etc.

    Dying power supply? Overloading or shorting? Nope, all voltages nominal, and it was brand new.

    I was about to try a spare power supply and a thought occured to me..

    It's almost as if the reset switch was being hit, but it wasn't even close to being knocked at any point and the switch otherwise worked fine. Then I knocked the case and the system reset. Yep, the reset switch was faulty, jolting it even slightly would reset. Who needs a reset switch since Vista anyway? Unplugged it from mainboard. Solved.

    I decided not to even joke about charging my Dad for two hours of my time.

    Chances are if he paid someone to do it they wouldn't necessarily have found the fault that quickly, and he'd be hundreds of dollars out of pocket.

    The lesson in troubleshooting? Um... I'm not sure.

    --
    After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.
    1. Re:My lesson. by girlintraining · · Score: 3, Interesting

      The lesson in troubleshooting? Um... I'm not sure.

      You did exactly what any computer tech should: Check the most common reasons for failure, then move to the edge cases. A faulty switch is rare. Swapping individual components out would have eventually narrowed it down to the case itself. Two hours sounds about right for a competent technician to run down the list to get to the point where that would be a likely cause of failure.

      --
      #fuckbeta #iamslashdot #dicemustdie
  17. Re:Assigning blame doesn't alway help by EricX2 · · Score: 3, Funny

    I haven't read the title, summary, article or any posts, and I think it's time to get new glasses... I can't read a damn thing!

  18. Chaos Manor? by dpbsmith · · Score: 4, Insightful

    How many people had the same impression I had: "Why, this sounds exactly like one of the 'Chaos Manor' columns Jerry Pournelle used to write in BYTE!"

    All it needs is a few of Jerry Pournelle's favorite stock phrases. "The disk trundled for a while..." "I tried swapping out the hard disk, but no joy..." "I called up Bill Godbout..."

  19. Re:Top 7 problems with Windows 7 by techno-vampire · · Score: 4, Funny
    [X] If someone says "There's an app for that" one more time I'll throw a chair at them!

    Is there an app now that throws chairs for you?

    --
    Good, inexpensive web hosting
  20. Interesting idea, but not the same... by Mathinker · · Score: 5, Insightful

    One of the reasons I use Linux is that, currently, it is much more secure than Windows, given my personal use scenario.

    Yes, if I were a specialist in securing Windows that might not be the case, but I'm not. Yes, if equivalent amount of effort was invested to break the security of casual users of Linux compared to that invested in breaking Windows, again, Linux might not be any more secure than Windows (well, with Linux, there are distros where I can always boot off of USB and then not save any changes, so until Microsoft offers me the same functionality there's little chance that I could use it in as secure a fashion as I can use Linux).

    Running Linux in a VM under Windows just wouldn't "cut it" for me. Sorry.

  21. Also quantity and quality are often exclusive by Sycraft-fu · · Score: 3, Insightful

    When you crank out a lot of stuff, it is extremely hard to make all of that stuff be high quality. Quality usually takes time, it takes research, it takes refinement. It is possible, in some rare cases, to have someone that produces a vast quantity of work, all of which is top quality. However it is far more common to see someone produce a vast amount of mediocre to bad quality work.

    As an example: Dr. Mark Russinovich has written a grand total of three technical books to date. So, clearly a man who doesn't know what he's talking about right? Wrong. Those three books are "Inside Microsoft Windows 2000," "Windows Internals Fourth Edition," and "Windows Internals Fifth Edition." He has, literally, written the book (along with David Solomon) on the recent versions of Windows, published by MS themselves. These are extremely accurate, comprehensive, technical documents of Windows down to its very fundamental levels. He also has written a suite of tools, the Sysinternals tools, so good that MS bought them, and hired him on as a technical fellow.

    So while he's produced only three books, they are all of the highest quality of technical information. There haven't been more because he hasn't had the time to write hundreds of books, nor the need to issue revisions to correct problems with the ones he has (each new edition covers a new version of Windows).

    Thus when I hear someone talk about how good they are because of the quantity of they works, I am skeptical. The only way you get a vast quantity of high quality work is either laboring an entire lifetime (and even then often not), being a prodigy, or both.

  22. The lesson here is that people still don't have... by sarkeizen · · Score: 3, Interesting

    ..a clue about how computers work. Even experienced windows professionals.

    I mean this guy has 32 bit OS working and moves to 64 bit OS...am I following this ok. The 32 bit install presumably went well on the hardware and the 64 bit install fails.

    So I grok his first attempts which are replacing the install media once. Seems like a reasonable assumption (some bit out of the billions on the DVD image just happened to be flipped the wrong way). From there though he starts to lose me. The motherboard is perhaps plausible but you would have to be assuming some rather significant difference in hardware support between the 64bit and 32bit systems. From there? RAM is 64bit how? Or even my HD?

    I think the most significant thing to learn here is twofold.

    i) People - even experienced computer professionals - treat computers like they are magic. Like there is no real science behind how they work. Clearly this guy was replacing parts based on some "experiential weighted average" with regard to how likely they are to cause a "weird" problem.

    ii) When A. C. Doyle said "When you have excluded the impossible" he neglected to state that the *order* in which one does so is significant. Eliminating things in order of their apparent relation to the problem (i.e. all the things for which 64 bits makes a difference) and (in a business environment) with respect to cost (i.e. Replacing a CPU is often a cheaper test than replacing a motherboard wrt labour) will likely fix your problem sooner than just going for the "usual suspects".

    Aside: I've had two cases where I found a CPU issue. One was very similar to this - crashing during a Windows 2000 install - often at the same place. The problem I had was actually thermal - the heatsink was reversed leaving the thermal patch making minimal contact with the heat spreader. Somehow I figured that out without replacing everything else first.

  23. I'd do it the other way around by Viol8 · · Score: 5, Insightful

    INstall linux and run Windows in a VM. When your windows install gets infected/hosed with a virus/malware/whatever it could well mess up your linux VM machine and make it inrecoverable but if you install Windows in a VM and run on top of linux the worst that can happen is the VM gets hosed.