AMD Could Profit from Buffer-Overflow Protection
spin2cool writes "New Scientist has an article about how AMD and Intel are planning on releasing new consumer chips with built-in buffer-overflow protection. Apparently AMD's chips will make it to market first, though, which some analysts think could give AMD an advantage as the next round of chips are released. The question will be whether their PR department can spin this into a big enough story to sell to the Average Joe."
They are protecting the pages marked as code from the data pages. Code could still overflow, but not use that to execute arbitrary code in the pages marked as data(or non-executable).
It will likely be in their architecture manual. The summary of the protection is that it allows the OS to mark pages of virtual memory with a No Execute (NX) bit. Attempting to execute any instructions from such a page would cause a trap to the OS.
An OS would then use this to mark pure data page and areas like the stack as NX so that overflowing datastructures doesn't allow arbitrary malicious code to be run.
I assure you it's not just Microsoft who's to blame.
Linux: Free if your time is worthless.
The AMD Opteron and Athlon 64 chips already
have the buffer overflow protection in their hardware and the
feature is already supported by both Linux and Windows XP 64-bit
edition. AMD calls this "Execution Protection" and the
basic idea is that the processor will not allow code that arrives to
the system via a buffer overflow to be marked as
executable. The slashdot story says "will have" for both
Intel and AMD when it should read "AMD already has and Intel will
have..."
AMD processors have both of those features. AMD has done well at matching Intel feature for feature. Take a look at Opteron for servers. It doesn't help right now that there are a lot of Intel boards that shipped defective. I was replacing backplanes for a solid month just before the New Year. The latest Xeon's really aren't that impressive either. There was a time the Xeon was an incredible processor worthy of running a NOC but now they are hot enough that Opteron and other players look real nice again.
Exactly. OpenBSD 3.3 already came with this feature in May 2003.
"W^X (pronounced: "W xor X") on architectures capable of pure execute-bit support in the MMU (sparc, sparc64, alpha, hppa). This is a fine-grained memory permissions layout, ensuring that memory which can be written to by application programs can not be executable at the same time and vice versa. This raises the bar on potential buffer overflows and other attacks: as a result, an attacker is unable to write code anywhere in memory where it can be executed. (NOTE: i386 and powerpc do not support W^X in 3.3; however, 3.3-current already supports it on i386, and both these processors are expected to support this change in 3.4). "
This has nothing to do with Microsoft, and everything to do with architecture and programing languages.
If you program in C on Intel you are going to have problems without almost fanatical devotion to the Po^H^H management of your memory resources.
That goes for Linux as well, as any check at Bugtraq can confirm.
Yes, people should be very careful when coding in languages and on architechtures which allow buffer overflow, but the real solution is at a level lower than the coder's.
KFG
As for the other features you mention. You are comparing Desktop processors and server processors. You might note the lack of the Opteron processor in the third party tests you linked to.
Bout two months ago someone came to me with a motherboard and processor, Athlon XP 2600+. They couldn't get it to boot. I took one look at it and realized the heatsink was on backwards, it shut it self down as soon as it got hot enough. I put the heatsink on correctly and the thing booted right up.As for the PCI locking its a bit harder to vouch for since I don't see a whole lot of information about it, but I sure do recall seeing tests involving the Opteron, if I could find it right now I would, except I'm on dialup now for the first time in six years and its annoying the hell out of me.
Any application that creates code in stack-based memory such as a local (auto) variable, or in one of the standard heaps (from which malloc and "new" memory come) will be affected. This memory is no longer executable and cannot be made executable by an application. Some existing JIT compilers are affected and will need rework.
To work with memory protection enabled, applications will need to allocate memory using VirtualAlloc and specify the memory options to make it executable. Then they can generate and run the code there.
I am assuming that Linux could incorporate some similar functionality, anybody know if someone is working on it?
1) It is also in Prescott
2) It needs OS support, specifically XP SP2, which isn't out yet.
3) It doesn't really do what it is meant to, I have seen several 'theoretical' discussions on how to circumvent it. Think of it as another hoop to jump through for the black hats.
4) You need to be in 64-bit mode to use it
5) 4) requires a recompilation anyway, why not do it right with the right tools when you recompile?
6) I know of at least one vendor using to bid against intel on contracts now.
7) Oh yeah, this will do a lot of good. Really. When has a white paper ever lied?
8) The more you know about things like this, the more you want to move into a cabin in Montana and live off the land.
-Charlie
Except you could write/read from the CODE segment and you could far jump into the data/extra/stack segment registers.
.COM or TINY model program].
What's better is that CS==DS was a common mode [known as a
So there goes your theory.
Tom
Someday, I'll have a real sig.
The Intel x86 architecture has few registers, so if you want to keep lots of values handy, you're going to have to keep swapping values in and out of memory. Alternatively, immediate-value constants can be hard-coded in the code that do not change during a long loop or a loop with many layers of nestedness. Just before the loop is executed, these hard-coded constants will be modified by re-writing the immediate-values in the code. An example of this is some code that draws a scaled translucent sprite. Throughout the code, the scale will remain constant, and if the translucency is uniform, that will remain constant too. The code that does the translucent blitting will use the registers only for values that change during the sprite-drawing.
On an 80386, using this technique will cause a significant speed-increase in the code, but on 80486's and above where there are on-board L1 caches on the CPUs, the code-modification may cause cache-misses that may slow down the system - espcecially if it is run on an even newer x86 CPU that has a seperate program and data cache in the L1 cache. To make things worse, nowardays, most code runs in a multi-tasking environment, so trying to figure out if self-modifying code causes a slowdown or a speed-increase is almost impossible to predict.
Of course, nowardays, most drawing is done by hardware accellerated graphics cards so this isn't a good example, but there could still be some use for hard-coding values that do not change in a loop.
Several architectures (sparc, sparc64, alpha, hppa, m88k) have had per-page execute permissons for years.
See This BugTraq posting by Theo de Raadt
That's not true at all. Only the OS needs a new version. The OS simply marks pages allocated to the stack as "No Execute", and voila, programs can't use a buffer overrun to execute code.
All modern architecture's have seperate caches for code and data. Simply flushing the i-cache will allow you to update your code on the fly.
yes, this is one of the wonderful misfeatures of x86. i don't know what this article is all about. amd64 ALREADY has an execute bit in each pte, when it's in long (64bit) mode. this is nothing new; it's been in amd's manuals for while. i'd bet it was one of the first x86 problems they planned to fix.
Call me stupid, but AFAIK x86 chips have full segmentation support (in protected mode obviously) - ability to define different segment types (read only, r/w, execute only, etc)... For those of you not familiar with it, it allows the programmer to define different types of memory segments, which would allow you to do some pretty interesting things such as defining read-only code segments (so the machine instructions can't be modified in memory), and non-executing data segments (to prevent OS from trying to run code stored in program data/buffers). This would solve the problem, at least how they addressed it in the article.
;)
If current operating systems actually used this in addition to paging (which is what most of them only use now), why would they need to create a new chip? Linux does not fully utilize segmention, mostly only paging. I don't have any resources on MS OS design right now so I can't comment on it... (although maybe looking at the recent source would help some
# fuser -v
#
"It's not a cure-all solution."
Nor will there ever be IMO. But this combined with good practices like not running as admin when we don't need to (reading email, web browsing, game playing for example) will be a huge leap forward.
You are. A buffer overflow works by overflowing a stack-allocated buffer, causing other stack-allocated data to be overwritten. The usual method of exploiting this is by overwriting the return address with a value that points back into the buffer, so that the function will return straight into the buffer data, where the cracker will have put executable code of course.
A way to provide some protection against this is by disabling the ability to execute code that is located on stack.
Note that:
1. there are already linux kernel patches to do this on x86 hardware, but they incur a slight performance penalty because they're implemented by abusing page table caches (there are separate ones for data, and you can deliberately make 'em inconsistent so that the table entry for data says access is allowed, while the one for code says it's disallowed)
2. this does not prevent buffer overflow exploits entirely, it just makes 'em a lot harder. There are tricks you can still use sometimes like putting the known address of some useful library function into the return address
hope this helps to clear it up a bit
Do they overflow the current process's virtual address space?
No. On the stack itself, in addition to the local data for a function (and the saved registers), is the return address that you are going to jump back to after the function is complete. Buffer overflow exploits write past the end of the buffer. So you are overflowing the function's local data, not the entire stack segment. As the previous poster mentioned, because the stack grows downward, your overflow can write over the return address, which is where all the nastiness starts.
In addition to this, is the fact that the binaries are always the same for each machine, and the process's memory all logically maps to the same location (windows user code maps to 0x10000000).
So, say someone writes a program and somewhere has a static buffer for input which is 256 bytes, and doesn't check bounds on input data. You can construct an input which is more than 256 bytes, and your data will overwrite stuff which is outside of the input buffer, perhaps the return address. So, with the proper input, you can make the program jump to an arbitrary point.
Usually, whenever a function is called, it will be called at the same depth of recursion. Like, I might make a function, "authenticate", which asks for your username and password (storing them without checking in my 256 byte buffers), then checks credentials and either proceeds or returns an error code.
This function will probably only be called once, and it will always be called at the same time in program execution, relatively early. The stack will always be the same size when it is called. (Like, your call stack at this point might look like: main() -> initialize() -> authenticate()) or whatever).
Sometimes, a function might be called from multiple places... Maybe there is something like "getAddress()", which does pretty much the same thing, it grabs an address input by the user, but it might be called from many places in the executable. Each call will have its own characteristic call stack, and offset within the stack segment. The stack frames of all functions leading down to it will be present. (You can usually examine the current call stack in a debugger).
If you know "where" the function will be called from in this manner, you will know the exact stack layout at this point, including the absolute addresses and everything (which you know because the binaries are always the same and the executable always maps to the same logical place in memory).
So, you can overwrite the return address so that it returns to inside the input buffer. Then, you have 256 bytes (in this example) to work with for constructing your little exploit. Often, the exploit will be just a stub which downloads another malware program and launches it, or whatever.
There is a little bit more to it. Like, you usually need to construct your input so that you don't have any 0 bytes within it, because that will signify the end of a string. The input, even though it's not bounds checked, might still be validated in some fashion. (I think I remember reading about someone who had made a "codec", so that the input data could be composed of valid alphanumeric characters. So, even the unpacker was alphanumeric, which is pretty cool).
Read on. There is specific support for the NX flag on pages. If you boot with noexec=on, then stack/heap/data is automatically protected. If the page fault handler sees your thread because of an NX flag violation, the process is killed.
Caveats: you can't mprotect it back to execute status, and it breaks some software, especially Mozilla/Java/Ada (just like exec-shield...)
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
32 bit x86 chips have the write and execute flags combined in the page tables. The segment descriptors have seperate bits for them. Intel basically expected people to use segmentation on the 386 rather than paging, so the original paging implementation was a little subpar.
Segmentation offers much finer control over memory (allocations can be sized to the exact byte, with a fault generated on any out of bounds access) and a larger virtual address space (48 bits, accessed in segments up to 4 GB). The problem with segmentation is the kernel memory management becomes a lot more complicated, so OS developers have avoided using the segmentation. x86 chips are the only ones to provide segmentation support, so developers of portable OS's avoided the feature as well.
When AMD designed the x86-64 architecture, they had to design new page tables to deal with 64 addresses. While making that change, they also seperated the write & execute permission bits.
Do you remember when the "Intel Inside" logo came out?
1991, according to Intel themselves
There was no real competition. (it was the Pentium days) There were other processors, but the Pentium pretty much blew them away.
The Intel Inside marketing program started two years before the Pentium came out. At that time AMD was competing very effectively with the 486. So much so that Intel wanted a new marketing campaign to try to bring people back. Even in the early Pentium days AMD continued to compete effectively. Their 5x86 120MHz chips were very competitive with the Pentium 60 and Pentium 66, and even the 75MHz Pentium chips. It wasn't really until '94 or '95 that Intel really started leaving AMD in the dust, mainly because AMD was WAY late at releasing their K5 processor and when it did come out they had so many problems manufacturing it that it was clocked much lower than initially hoped for. Cyrix continued to offer some competition for Intel during this time, but they were plagued by crappy motherboards which gave them a poor reputation (it was a bit of a self-fulfilling prophecy thing: reputation for being cheap crap meant that they were put on cheap crap motherboards which resulted in a poor quality system).
it will be [better] because it is cheaper
And that is somehow an invalid reason for a product to be better?
This "new feature" for marking pages as having a non-executeable stack is *already* part of the Athlon-64 chips. The New Scientist article was talking about how a new version of XP will begin using it soon--not that it's not yet released.
AMD has already made Intel look bad by getting their 64-bit CPU into the mass-market first, and this feature was implemented partly to provide a facility that some other platforms (e.g. Solaris on Sparc) have had for quite some time.
Like the PIII Coppermine CPUs that wouldn't even boot sometimes.
Or the randomly rebooting PII Xeons.
Or the voltage problems with certain PIII Xeons.
Or the memory request system hang bug in the PIII/Xeon.
Or the PIII's SSE bug whose 'fix' killed i810 compatability.
Or the MTH bug in the PIII CPUs that forced Intel customers to replace boards and RAM.
Or the recalled, that's right, recalled PIII chips at 1.13GHz.
Or the recalled (there's that word again) Xeon SERVER chips at 800 and 900MHz.
Or the recalled (that word, AGAIN?!) cc820 "cape cod" Intel motherboards.
Or the data overwriting bug in the P4 CPUs.
Or the P4 chipset bug that killed video performance.
Or the Sun/Oracle P4 bug.
Or the Itanium bug that was severe enough to make Compaq halt Itanium shipments.
Or the Itanium 2 bug that "can cause systems to behave unpredictably or shut down".
Or the numerous other P4/Xeon/XeonMP bugs that have been hanging around.
Yes, I did consider the possibility that there might just be some basis for the belief that Intel's products are superior. Having considered that, in light of the mountains of evidence to the contrary, I shall now proceed to laugh at you.
Ha ha ha.
Now go away, or I shall mock you again.
-- "Government is the great fiction through which everybody endeavors to live at the expense of everybody else."
"Granted you could be nervous about this since 3dfx went the way of the dodo, but since AMD doesn't make POS video cards that double the weight of your box...they should be safe ;-)"
3DFX's problem had nothing to do with their products. Their problem had to do with the fact that they got greedy - extremely greedy. After their first few successful graphics chips were launched, they basically shut their board makers out in the European market with the purchase of STB. They began producing their own boards, and had production capacity sufficient to supply the European market, and that's about it. Thus, other board makers were still necessary for other markets, such as the US. Having been bent over by 3DFX in the European market, board makers essentially told 3DFX to take their chips and stuff them. Thus, 3DFX was left with the choice of abandoning every market but the European (you're joking, right?), or dipping into (read: draining) their R&D budget. Noting that option 1 was suicidal, 3DFX chose the latter. Thus, production was bumped, the new Voodoo 3 graphics cards were an outstanding bunch, and virtually no R&D was accomplished for a few years. Wait; did I say they didn't do any R&D for a few years?! Yes - yes I did. Thus, the thus far sub-standard (where 3DFX was the standard) 3D graphics card/chip makers were able to catch up to, and surpass 3DFX in both performance and features. Glide, 3DFX's baby, was eclipsed by the more open, if less fully-featured, OpenGL in game support. By the time 3DFX had enough production capability to start working on new cards, the writing was on the wall. Ati, Matrox, and nVidia were already too far ahead for 3DFX to have a chance competing against. 3DFX dumped the last of their cash into creating an extraordinarily powerful, goofy as hell looking, wildly expensive set of cards, which saw almost no time whatsoever in the market before 3DFX was forced to sell all IP rights to nVidia. 3DFX, nothing more than a shell of a company with no IP, then collapsed about a month later.
The last good card from 3DFX? The Voodoo 3 3500. Their last great card? The Voodoo 3 3000, whose overclocking ability was absolutely beyond anything anyone had ever before imagined possible. With stock cooling, one could achieve gains that would be thought of as ridiculous (percentage-wise) today. My own V3 3000, whose default memory clock speed was 166MHz, hit 220MHz with the stock cooler with no artifacts. I recall pushing it a bit higher with a rigged cooling system before finally replacing the card (it was getting OLD). 200MHz was common for the memory speed on those, and values as high as 240 - 250MHz had been reported, though often not without some artifacts. The quality of components was next to none from 3DFX. It was not their product, but their arrogance that was their undoing.
-- "Government is the great fiction through which everybody endeavors to live at the expense of everybody else."
Spend the few extra dollars on a good motherboard with the nForce2 chipset. I run an Asus A7N8X-E Deluxe and in my experience it's very speedy (compared to my old ECS K7S5A, bleh) and packed to the tits with features (FireWire, SATA+RAID, USB2.0, etc..).
.. totally cool :)
Also, good memory (we're talking at least the lifetime warranty kind here) is totally necessary if you want your system to be stable at high frequencies, it seems AMD CPUs are more sensitive to bad/cheap memory (particularly in ECS boards, they're cheap, but avoid them if you at all can).
On a side note, AIDA32 shows the chipset bus on this board as being 8-bit HyperTransport v1.0
DJ kRYPT's Free MP3s!