Handling 'Unexpected Interrupt 0D' Errors Under NT?

← Back to Stories (view on slashdot.org)

Handling 'Unexpected Interrupt 0D' Errors Under NT?

Posted by Cliff on Saturday August 24, 2002 @03:08PM from the low-level-serial-programming dept.

Jersiais asks: "I am trying to get some command line stuff running on NT4 server with Take Control installed on an old 200MH Pentium II (Before anybody throws up, it's the test-it-&-wreck-it machine, not the real thing so there's no actual LAN there). Even on the real thing the compiler under command line has a tendency to blow up at random with 'Unexpected Interrupt 0D'. This only happens on the Pentium II, on the real (Workstation) thing it doesn't. I've found 3 different descriptions of Int 0D, none of which make any sense. Anybody any ideas how to get around it, or get rid of it? The compiler is 32-bit to interpreted intermediate and I have a RP calculator running as a test on the work system already, despite its use of soft interrupt IO."

22 of 59 comments (clear)

Min score:

Reason:

Sort:

Finally.. by Noodlenose · 2002-08-24 15:29 · Score: 5, Funny

It had to happen one day:
This is officially the first /. post I don't understand.
At all.
Damn..
First things first by ObviousGuy · 2002-08-24 15:55 · Score: 5, Insightful

What compiler?

What is crashing? The compiler? The command prompt?

What are you doing when it crashes?

Does this happen with other compilers? Other programs?

--
I have been pwned because my /. password was too easy to guess.
Ralph Brown's interrupt list.. by Jon-o · 2002-08-24 15:57 · Score: 5, Informative

I know next to nothing on the subject, but when I was tinkering about back in the good ole' DOS days, I came across this list of interrupts: http://www.delorie.com/djgpp/doc/rbinter/

I expect most people have seen it. It lists the following fod 0d:

0D INT 0D C - IRQ5 - FIXED DISK (PC,XT), LPT2 (AT), reserved (PS/2)
0D INT 0D C - IRQ5 - Tandy 1000 60 Hz RAM REFRESH
0D INT 0D - HP 95LX - INFRARED INTERRUPT
0D INT 0D C - CPU-generated (80286+) - GENERAL PROTECTION VIOLATION
1. Re:Ralph Brown's interrupt list.. by zenyu · 2002-08-24 18:55 · Score: 2
  
  Compilers use a lot of memory. I bet the real difference between the two computers is RAM, a bug that never shows up on a 1 Gig machine will call for attention on a 64 MB machine.
2. Re:Ralph Brown's interrupt list.. by keesh · 2002-08-24 23:27 · Score: 2, Troll
  
  I know next to nothing on the subject
  So you're in a perfect situation to post about it on slashdot then.
Possibly a hardware problem, or bad software by Anonymous Coward · 2002-08-24 16:26 · Score: 3, Informative

Int 0Dh is General Protection Fault, issued by the processor when illegal instructions or memory accesses are encountered. It's likely your compiler is catching GPF's instead of letting them pass on to Windows where you would get the generic "This program has crashed...blah blah" message. The interrupt could be caused by bad software or bad hardware. Gcc randomly crashes with the same interrupt on bad hardware, normally bad memory or processor cache.
Crappy hardware by cperciva · 2002-08-24 16:35 · Score: 4, Insightful

Let's see... you have unexpected protection faults, you're running on antique hardware, and when you try the same code on a different machine, it works fine.

That sounds exactly like the symptoms of hardware which has exceeded its MTBF.

--
Tarsnap: Online backups for the truly paranoid
Re:http://www.google.com/search?hl=en&lr=& by flonker · 2002-08-24 16:43 · Score: 2

Parent is right.
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe =utf-8&q=%22Unexpected+Interrupt+0D%22&btnG=Google +Search

It's right there. Again, this doesn't belong in Ask Slashdot. It belongs on usenet, in one of the asm groups. Alternately, just use google, it's right there. Blargh. Why don't people do basic research before posting an ask slashdot?
Unsupported Software by eviljolly · 2002-08-24 16:46 · Score: 2, Funny

I'm sorry sir, but Slashdot does not support that software, please call your OEM for further help.
What the hell... by buzzbomb · 2002-08-24 16:58 · Score: 2

You have a "200MHz Pentium II" and are getting unexplained errors. Perhaps I missed something, but the first Pentium II was 233MHz.

Maybe the problem is that you don't know what you're doing.

Either that or I'm a half-drunk asshole. Either answer wouldn't surprise me.
1. Re:What the hell... by realgone · 2002-08-25 01:29 · Score: 5, Funny
  
  Maybe the problem is that you don't know what you're doing.
  Either that or I'm a half-drunk asshole. Either answer wouldn't surprise me.
  We should also consider the possibility that you're half-drunk *because* you don't know what you're doing. I mean, come on -- you posted this at 1 a.m. and you *still* weren't fully hoisted yet? What were you drinking, Tequiza or something?
  Get with the program, people...
Your equipment is probably hosed. by Eneff · 2002-08-24 18:25 · Score: 3, Informative

0D is often hardware.

That's why it works on the other computer.

You have three options.
A. hope it's some sort of HD corruption and it's just windows being stupid. cheapest. Do a full scandisk on it, and see if it's having trouble. if it's not...

B. Replace the memory. Memory gone bad isn't pretty. If *that* doesn't help,

C. Throw it out the window, because you probably have some sort of motherboard or other bugs you just don't want to diagnose.

And thank you for calling Microsoft Technical Support. Do you want the bill on Visa, Mastercard, or Discover?
1. Re:Your equipment is probably hosed. by n9hmg · 2002-08-25 01:47 · Score: 2
  
  Does everything on here have to involve Linux? Besides, I doubt that Linux is going to choke on arbitrary bad sectors. Bad sectors in the boot chain or swap, sure, but unless the drive is in the process of failing, you should have them all mapped out anyway at install time, via badblocks (mke2fs -c).
  
  The parent post reminds me of this classic scene(shoot me if I got the band name wrong)
  
  Beavis: Winger sucks!
  Butthead: Dude, That's not Winger
  Beavis: I know, I just felt like saying that.
0D by adolf · 2002-08-24 20:00 · Score: 5, Informative

It's not an NT error, but an Intel one, dating back to the Beginning of Time (or the 6MHz 286, anyway). The same errors are reported in the same way under OS/2, and probably a number of other operating systems - I seem to recall Win95 puking out similar nomenclature during at least one BSOD.

Under OS/2, such screaching halts are known as "traps," instead of blue screens. And since OS/2 users were generally more knowledgable about computers then, than NT users are today, there's a lot of information available to help with fixing it.

According to groups.google.com-archived message from 1993, 0D is a General Protection Fault.

GPFs happen all the time with bunky hardware. Try re-seating (or just purchasing new) RAM, CPU, and anything else socketed that you can find.

And if that doesn't work, toss the machine. Or give it away to someone with stubborn enough to fix it. Different boxes of similar ilk are available in the $50 range, these days - no need to spend any absurd amount of time with a diagnosis.

--
Kid-proof tablet..
1. Re:0D by Spoing · 2002-08-25 10:52 · Score: 3, Informative
  
  In general, you're right;
  1. Int13 (hex 0D) is an Intel CPU generated error code. (Don't shoot the messenger -- the CPU reports the violation and is very very rarely the reason for the failure.)
  2. If the same software works on one machine but does not work on a similar machine it's often not worth the time to find out why it's failing. (Good guess: it's probably faulty hardware -- dammaged or designed broken.)
  In addition...
  3. Int13 can be caused by faulty hardware or software. Bad software usually wins the coin toss. Since it happens in this case while using a compiler, I'd say software is the likely cause -- the compiler or (hate to say) your source.
  4. Only occurs when the processor is in protected mode. Simply stated; you've got no process isolation in an Intel processor's initial mode at boot time, in DOS (not a command prompt) and while in the system BIOS (aka "real" mode).
  5. Protected mode enables the Intel MMU (memory management unit) and requires a program (usually the OS) to manage the GDT (general [memory] descriptor table).
  6. If improperly managed by the GDT control program, processes can bleed into other areas. A proper response by the OS to violating and attempting to modify/read areas it is not allowed to use is to close the process and flag the error.
  7. In quite a few situations, violations (int13 and otherwise) are OK and expected. These violations are used to trigger responses such as virtual memory page swapping and interrupt handling. Anything outside an expected violation may point to hardware failure, software corruption (by an errant program), or
  8. Failures that happen on the OS level can only be cought _after_ the violation _as_long_as_ the process does not nuke critical parts of the OS or the GDT. This means that a violation that is announced usually means your system is in a suspect (possibly instable) state.
  9. This is why few things should run as extentions to the OS (ring 0) and should be run at the user level (ring 3).
  Rant: Video and other hardware drivers should never run at the OS level let alone other programs that are not part of the OS that specifically is designed to manage memory and other core system hardware. Limited and focused use of OS level resources is a necessity -- because if the OS is corrupted, all bets are off including sane int13 handling
  
  --
  A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
2. Re:0D by Spoing · 2002-08-26 11:30 · Score: 2
  
  Correction: GDT = Global Descriptor Table. It's been a while since I've delt with this.
  
  --
  A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
3. Re:0D by Spoing · 2002-08-31 10:19 · Score: 2
  
  You're on the right track; if the code depends on hardware events, you have to deal with timing issues.
  Another frequent reason is memory offsets. A slight difference on similar hardware (or with different drivers or software) may allow one system to 'work' (it's corrupting or accessing it's own address space -- BAD), or 'fail' (you get an int13 or other error -- actually a good thing; you are told something is wrong).
  This is not an exhaustive list. Happy hunting...
  
  --
  A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
Intel NIC? by kruczkowski · 2002-08-25 03:47 · Score: 2

Do you have an intel Nic? replace it. I found most BSOD to be from Intel nics. In linux they work fine, but in NT they just die at random.

--
hmm... for fun I enjoy launching DDoS attacks against 127.87.42.5
Signal 11 FAQ by Mignon · 2002-08-26 01:56 · Score: 2

I don't know if this will directly address your problem, but I found it helpful once for diagnosing a bad FPU. There's lots of good tidbits talking about bad hardware and its symptoms.
Re:Int 0D by Tony-A · 2002-08-26 03:09 · Score: 2

Check technet.microsoft.com it's the first place to look regarding windows
I didn't think Microsoft's technet was that out of date.
The XT, for you youngsters out there, was when they added a whopping big 20 meg hard drive to the pc. It's before Intel made the 80286.
With the AT (80286) they added a second interrupt controller, accessable by 16-bit cards and moved the hard disk interrupt to IRQ14 (primary) and IRQ15 (secondary) IDE controllers. IRQ5 now standard for lpt2 but somewhat avoided because of conflict with hardware interrupt 0Dh on 80286+, the famous General Protection Violation.
Check google.com when you actually need useful information.
Re:http://www.google.com/search?hl=en&lr=& by flonker · 2002-08-26 16:24 · Score: 2

I'm not sure I understand what you're saying. And I'm not sure of your ASM proficiency level, so I'll go into some details that may be redundant to you. And I think you might have said you resolved the problem, so I dunno if this matters anyway.

According to http://swatch.binary.com.tw/delphi-ti/19057.html, found with Google,

Interrupt 0d is the general protection exception and is generated by any protection violation that does not generate some other exception. See the above question for a more complete description of the problem. Common causes of this problem are network boards and certain hard disk controllers.

Interrupts can be either software generated, or hardware generated. Assuming this is a hardware generated interrupt, it's set when the processor receives an IRQ (Interrupt ReQuest). In this case, the processor recieved an IRQ for int 15. From the article, we get that IRQ 15 is our old friend, General Protection. Here, General Protection is (most likely) protecting us from bad hardware. If you trace through your code, or set AfxMessageBox() calls in your code in key places, you should be able to trace where the fault occurs. (AfxMessageBox() does block the thread until you hit OK, BTW.) At this point you should have figured out where the fault gets flagged, and from here you diagnose exactly which hardware is bad.

If you haven't figured out the problem this way, generate checksums of the file, both on the faulty hardware, and the good hardware, to see if it wasn't changed due to a faulty HDD. If the checksums look OK, then test your memory. If that tests OK, then you may be looking at a faulty CPU. Check to see how hot the CPU gets, that may be what's generating the error.

Or it may be something else entirely. Debugging flakey hardware in software is often quite tricky. I've thrown out MoBos before after diagnosing that something on the MoBo was broken, but never knowing exactly what it was. And I'll do it again. Oftentimes, diagnosing hardware isn't worth the headache.
Re:Int 0D by Tony-A · 2002-08-28 17:51 · Score: 2

From 80386 Programmer's Reference Manual.
General Protection Exception
All protection violations that do not cause another exception cause a general protection exception. This includes (but is not limited to):
1. Exceeding segment limit when using CS, DS, ES, FS, or GS
2. Exceeding segment limit when referencing a descriptior table
3. Transferring control to a segment that is not executable
4. Writing into a read-only data segment or into a code segment
5. Reading from an execute-only segment
6. Loading the SS register with a read-only descriptor
7. Loading SS, DS, ES, FS, or GS with the descriptor of a system segment
8. Loading DS, ES, FS, or GS with the descriptor of an executable segment that is not also readable
9. Loading SS with the descriptor of an executable segment
10. Accessing memory via DS, ES, FS, or GS when the segment register contains a null selector
11. Switching to a busy task
12. Violating privilege rules
13. Loading CR0 with PG=1 and PE=0
14. Interrupt or exception via trap or interrupt gate from V86 mode to privilege level other than zero.
15. Exceeding the instruction length limit of 15 bytes (this can occur only if redundant prefex are placed before an instruction)

Basically, the machine code is trying to do something highly illegal. How it got there and why are a different matter.
Flaky memory is always a suspect.
Computed jumps based on leftover garbage (uninitialized variable) are another fun way to encounter the problem. Random code/data usually crashes eventually.
It is possible that it's just catching an attempted write to protected storage.

The Intel 386+ actually does have a very good hardware protection mechanism, which unless sombody managed a port of Multics, is effectively unused and subverted from protected segments to a nice flat space where anybody can do anything to everything.