64-bit x86 Computing Reaches 10th Anniversary

Let us give thanks.... by cold+fjord · 2013-04-22 10:52 · Score: 5, Funny

...for being delivered from Itanium and 32bit x86.

--
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell

Re:Let us give thanks.... by muon-catalyzed · 2013-04-22 11:20 · Score: 5, Interesting

The people at AMD who did this, an unquestionably biggest AMD's achievement to date, they should be rehired and given executive positions.
Re:Let us give thanks.... by Cyclon · 2013-04-22 13:15 · Score: 5, Informative

They're working on it: http://www.amd.com/us/press-releases/Pages/JimKellerJoinsAMD-2012aug01.aspx

Re:Did it really work? by Grashnak · 2013-04-22 10:56 · Score: 5, Insightful

My 32 GB of RAM, absolutely essential for my work, laughs at your "memory management" bullshit.

--
Life needs more saving throws.

Re:Did it really work? by sribe · 2013-04-22 10:57 · Score: 3, Funny

If it's such a success, why does 64-bit software generally only run marginally faster than its 32-bit build? 64-bit binaries are larger and might run 103% at the speed of 32-bit if you're lucky.

Sure, it helps with the 4GB memory space limit, but so can smart memory management and other approaches.

I could see it being useful for super-computing things, but in general, there still just doesn't seem to be a point.

Wow, just wow. Do you actually work in the software field???

64 bit x86 worked out, but not for AMD by iggymanz · 2013-04-22 10:57 · Score: 2

AMD may have helped create the x86-64 market, but now it's getting killed by it. soon Intel will be the only major player. ARM market is AMD's only hope.

Re:64 bit x86 worked out, but not for AMD by sayfawa · 2013-04-22 11:17 · Score: 4, Informative

The next console generation disagrees. Sony and MS are both using AMD.

--
Free the Quark 3 from asymptotic confinement! Bring your charm! Don't get down! All colours and flavours welcome!
Re:64 bit x86 worked out, but not for AMD by tlhIngan · 2013-04-22 16:49 · Score: 4, Insightful

AMD may have helped create the x86-64 market, but now it's getting killed by it. soon Intel will be the only major player. ARM market is AMD's only hope.
Intel won't let AMD die. In fact, AMD is right where Intel wants them to be - big enough to ward off government regulators, small enough to not be a huge pain in the rear. Intel and other large companies are scared of government regulation and monopoly declaration, and we do know that Intel has committed enough sins that if the regulators look hard enough, they can make a case to break up Intel. Including separating the ASIC design and foundry parts (and we know Intel has a LOT of foundry capacity). And I'm sure Intel's shareholders would rather give up some revenue to ward off the much bigger hit that would happen when the government regulators step in.
It's entirely possible that Intel has a bunch of "AMD rescue" plans - ranging from simple "let's just buy up all of AMD's CPUs and bury them" to more elaborate schemes. Of course, Intel cannot directly fund AMD. Perhaps Intel could give AMD some patents in an emergency.
Heck, you could argue that Intel told Sony and Microsoft to buy AMD chips - it gives AMD a nice steady income for the next few years. Intel could've used their extensive fab capacity to make custom chips for the consoles (much more easily than AMD can), but you can bet an opportunity like this to help prevent AMD from keeling over was just perfect.
And no, this isn't unusual in the business world. What you see as competitors can have all sorts of incestuous relationships amongst themselves - it's not unknown to have competitors to buy parts from each other. And you can bet Apple, Google, Microsoft, Samsung and others are far more chummy to each other than patent lawsuits or settlements will imply. There's enough back room deals and arrangements that really hide the interdependence on each other they all have.

Re:Did it really work? by vistapwns · 2013-04-22 11:07 · Score: 3, Interesting

Heard x64 was barely faster than 32-bit, wrote this program to find duplicate files on Windows: http://poshcode.org/3377 - it's at least twice as fast in x64 than 32-bit. Naturally it won't apply to everything, but for certain things x64 is really good.

--
"...I think the Microsoft hatred is a disease." - Linus Torvalds

Re:Whatever! PowerPC been doing 64-bit by larry+bagina · 2013-04-22 11:08 · Score: 5, Funny

MIPS and Alpha ask power pc to get off their lawn.

--
Do you even lift?

These aren't the 'roids you're looking for.

Re:Did it really work? by cbhacking · 2013-04-22 11:10 · Score: 4, Informative

Most programs still don't need to work with numbers larger than 4 billion on a regular basis, so native 32-bit ints are just as fast as native 64-bit ones.
Most programs still don't need to map more than 2GB (not 4GB; in fact not even quite 2GB) at once, so there's no pressing need for 64-bit pointers.

Software does take advantage of the fact that you can fit twice as many 32-bit values into the standard x86 registers if the registers are 64 bits wide, in the same way that you can stuff two 16-bit ints into EAX on a 32-bit system if you want to. However, the performance gains from doing so end up in conflict with the reduced cache coherency of larger binaries (bigger instructions) and possibly larger (less well-packed) data, resulting in more frequent cache misses. That's why the perf gains are typically very modest, although it really depends on the application.

Where 64-bit does become really valuable is working with very, very large amounts of sequential data (want to allocate a 10GB array? Can't do that on x86, no way no how). That's hardly a typical requirement right now (although I wrote a program a few weeks ago that needed to do it). However, it's getting closer. Additionally, while clever memory mapping can allow a 32-bit process to access over 4GB of RAM (just not all at the same time), there is a (small) performance impact associated with the need to be constantly re-mapping that memory.

The other area where 64-bit really helps is with security, specifically exploit mitigation. High-entropy ASLR in recent versions of Windows and some other OSes randomly places 64-bit aware executables and their various data regions across their entire 64-bit address space. This not only makes it completely impossible to correctly guess the address of any given bit of code in memory, it also makes spraying (heap spray, JIT spray, etc.) attacks completely infeasible; to cover even a tenth of a percent of the address space, you'd need to spray 16 million gigabytes of data. That's not only quite impractical at modern CPU speeds (even on a blazingly fast CPU and done in parallel, it would take a week or more), it also is far more memory (physical or virtual) than any modern computer will be able to allocate.

--
There's no place I could be, since I've found Serenity...

Re:Did it really work? by sribe · 2013-04-22 11:24 · Score: 4, Funny

do you? for average PC applications (browsing the web, e-mail, office documents) 64 bit gives no advantage. for the above-average applications (multimedia creation/editing, CADD, running multiple VMs, ) it's very helpful.

1) Yes, I do.

2) You are so wrong that it's actually funny.

It should not have been called XP... by Anonymous Coward · 2013-04-22 11:36 · Score: 2, Interesting

XP x64, Microsofts ginger step-son of an OS. Ignored and dropped like a hot potato as soon as they could.

You couldn't get drivers for half the stuff, even MS didn't provide their own software and lots of 'free for home, pay for commercial' stuff would detect it as 2003 Server and refuse to run/install.

Somewhat of a shame really as it wasn't a bad OS.

An Extra Bit of Register by Relic+of+the+Future · 2013-04-22 11:37 · Score: 5, Insightful

When AMD gave a presentation to my processor design course (not coincidentally about 10 years ago) one of the presenters said that one of the most surprising speed-ups for 64-bit code came from just having 16 real general purpose registers to work with. Even though register renaming lets you smooth over them, it meant all those extra load and store ops (that RR would identify as waste and work around) now didn't need to be in the code at all. It turned out to be rather non-trivial for one of their test apps.

So those 32 extra bits of memory addressing are nice. But don't forget about that 1 extra bit for identifying registers!

--
Those who fail to understand communication protocols, are doomed to repeat them over port 80.

Re:An Extra Bit of Register by Darinbob · 2013-04-22 13:25 · Score: 4, Informative

And this is something people who've worked on RISC chips have known for ages. The x86 system architecture is essentially stuck in the early 80s. The 386 was just a simple extension on top of 286 model, nothing really fundamentally changed, you still had limited number of registers each with at least one specialized purpose. Maybe MMX and similar stuff fixed that but you couldn't rely on everyone's PC to have the instruction set you compiled it for.
Intel was stuck supporting a very popular CPU with an instruction set that they knew was outdated, and they even tried having replacements for it that failed to gain acceptance. The reason this Opteron caught on was because it was backwards compatible with x86, not because it was the first thing to try to break out of the mold. And 386 was designed to be compatible with 286, which was designed to be compatible wiht 8086, which was designed to be compatible with 8085, which is compatible with 8080, which is compatible with 8008, which is compatible with 4004, which was the first commercially available microprocessor... (and all of those retain the original accumulator A register)

x32 ABI by Chirs · 2013-04-22 11:49 · Score: 5, Informative

And for those that want the best of both worlds, there is the x32 ABI, which uses all the good stuff from x86-64 (more registers, better floating-point performance, faster position-independent code shared libraries, function parameters passed via registers, faster syscall instruction... ) while using 32-bit pointers and thus avoiding the overhead of 64-bit pointers.

They're working on porting Linux to the new ABI...kernel and compiler support is there, not sure about all the userspace stuff.

Re:x32 ABI by KiloByte · 2013-04-22 12:14 · Score: 3, Informative

kernel and compiler support is there, not sure about all the userspace stuff.
Just debootstrap it from Daniel Schepler's repository. Most of the work has since moved to official second-class repositories (AKA debian-ports), but because of the freeze, you want both, So after debootstrapping, echo "deb http://ftp.debian-ports.org/debian unstable main" >>/etc/apt/sources.list and you're set.

--
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Re:x32 ABI by KiloByte · 2013-04-22 12:40 · Score: 2

except the ability to access more than 4GB of RAM
3GB typically. That limit applies only per process, and it's pretty rare for a typical user to have a single process that big.
Then, you have netbooks and/or vserver hosting where the entire [virtual] machine doesn't have that much physical memory.
x32 is also noticeably faster: over i386 for anything that wants registers, over amd64 for anything with more pointers than CPU's cache. Benchmarks vary wildly, but figures around 7% faster than amd64 are typical.

--
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.

Re:640k.... by Anonymous Coward · 2013-04-22 12:11 · Score: 2

Uh, no. He never said anything like that. But hey, don't let the facts stop you... just keep repeating that retarded meme.

Re:Did it really work? by MightyYar · 2013-04-22 12:28 · Score: 2

The program is written in C#. Only MS knows what is going on there.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.

Re:Did it really work? by vswee · 2013-04-22 12:31 · Score: 2

I just want all of you to know that you're making this thread very hard for me to score cause I'm not sure who's exactly right and who isn't. that is all. you may carry on now.

Not just for the extra memory. by Ecuador · 2013-04-22 12:46 · Score: 4, Interesting

In our algorithms lab there were programs that would gain more than 2x when compiled for 64 bit.
A more "real-world" example is when I started in 2005 at my current company. The engineers had 6-month old P4s @ 3.2 or 3.4GHz, running 32bit linux. For a project they used VisualStudio on VMWare and it took over a minute to compile the project. The company allowed engineers to choose their hardware, so I built an Athlon 64 @ 2.2 or 2.4GHz and I had it run 64bit SuSE. I remember the shock and awe from the first time I tried to compile the project under VMWare - a little more than 10 secs - the engineer next to me had his jaw drop. Of course most of the engineers immediately requested to switch to 64bit machines. I am not sure why it made such a difference in that application - perhaps the 16 general purpose registers come in really handy in this scenario? Of course it didn't help that the P4 was slower in everything (funny how at the time very few reviews really clarified this), but not order of magnitude slower...

--
Violence is the last refuge of the incompetent. Polar Scope Align for iOS

Re:Did it really work? by Burning1 · 2013-04-22 13:34 · Score: 4, Interesting

I think if you understand how truly horrifying PAE is, you would have no doubt at all that 64 bit platforms were the way to go. There's a lot of memory management cruft in the Linux kernel that x86_64 eliminates.

x86_64 also slipped in a few much needed enhancements to the ia32 architecture, including some extra general purpose registers.

http://en.wikipedia.org/wiki/X86-64

Re:Did it really work? by Anonymous Coward · 2013-04-22 13:47 · Score: 3, Informative

Software does take advantage of the fact that you can fit twice as many 32-bit values into the standard x86 registers if the registers are 64 bits wide, in the same way that you can stuff two 16-bit ints into EAX on a 32-bit system if you want to. However, the performance gains from doing so end up in conflict with the reduced cache coherency of larger binaries (bigger instructions) and possibly larger (less well-packed) data, resulting in more frequent cache misses. That's why the perf gains are typically very modest, although it really depends on the application.

You're arguing on the correct side, but what you wrote here is badly flawed. Packing multiple 32-bit values into a 64-bit register is near worthless, what is valuable is amd64 gives you twice as many general-purpose registers (that also happen to be 64-bits wide). A far bigger gain for 64-bit on x86 was the addition of full relative addressing. Instead of 32-bit jumps always being to absolute addresses, in 64-bit mode software can do addressing relative to the program counter. This helps a great deal with libraries, since instead of needing large relocation tables, they simply use relative jumps that are valid no matter what address the library is loaded at. With most processors using 64-bit mode loses performance due to having to shuffle more data around, x86 is about the only one that gains performance.

He forgot to use a hash table by raymorris · 2013-04-22 13:55 · Score: 2

If the OP compares each file with every file, that would be CPU bound. With a well chosen hash table it shouldn't be.

Nobody's said 64 bit Linux 4 years before Windows? by raymorris · 2013-04-22 14:25 · Score: 4, Interesting

Is this still Slashdot? Nobody mentioned that Linux supported x86 64 in 2001, before it was even released, while Windows was stuck at 32 bit for another four years.

Re:Did it really work? by BitZtream · 2013-04-22 14:43 · Score: 4, Informative

PAE is more or less old school segmentation. You can't say 'it has a 3% slow down' because it has 0 slowdown if that particular page is already in memory, and if not ... it has the same 'slowdown' as an other paging operation plus a fixed number of cycles. So if you're dealing with tiny amounts of 'more than 2/3gb' then the overhead is a lot higher than if you're mapping out 2GB on every window change. PAE is just another form of paging. It is slower, but you're making numbers up from nothingness.

The interger math performance of the processer has nothing to do with it being 64 bit. Most (All now?) x86-64 processors internally will process 2 32 bit numbers in the same span as a 64 bit number if properly optimized by sending the 32 bit values through together. 64 bit code using less than the OS max for 32 bit code is actually slower than 32 bit code due to the increased pointer sizes wasting the processors registers filling them with 0s.

You really have no idea how processors work. While nothing you said is illogical, it is still in fact wrong in every account. Under the hood, processors don't work anything like they do on the surface.

Other processors also do other weird things. I have an 8 bit CPU that can handle 32 bit numbers in a single clock cycle, exactly like it does 8 bit numbers ... and the neat thing ... it can do 2 16 bit numbers in a single clock cycle! Why? Because the processor as I see it from a software developers perspective isn't anything like the actual hardware doing the work. Processors have translation units in front of them to provide you with one look while allowing themselves to rewire the backend in all sorts of different ways.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager

Re:Did it really work? by cheater512 · 2013-04-22 15:08 · Score: 2

My first point was that PAE does have a overhead somewhat larger than the 3% the parent mentioned.
And that overhead increases with the amount of ram you have. Sure 32gig of ram has very little overhead with PAE. That is of course unless you actually use the 32gig of ram and then it will be constantly swapping memory pages around.
Yes I know most people don't use that much RAM. My point is still valid.

Also my 2nd point was that 64bit processors handle big numbers faster, not small numbers slower.
Yes different architectures can behave differently. We are talking about x86 though.
Fact: 64bit x86 will process a 64bit number faster than 32bit x86.

Re:"worked out" by Dawn+Keyhotie · 2013-04-22 15:11 · Score: 5, Insightful

WRONG on many levels. Yes, we had to get past the 4GB memory limitation, but there had been, and still were at the time, several other true 64-bit microprocessors around when AMD introduced the Opteron: Alpha, UltraSPARC, MIPS, PowerPC, and yes even IA-64. (not to mention IBM POWER and zSeries.) But they all had the fatal flaw of NOT being compatible with the Intel 32-bit x86 processors and off-the-shelf Windows software. Only Opteron had that, and that compatibility was so critical that Intel was grudgingly forced to adopt the x86-64 instruction set.

So, you may say, why didn't AMD take the IT world by storm? Because of 1) AMD was not Intel, and never could/would be; 2) Intel was paying manufacturers NOT to offer ANY AMD based systems with marketing kickback agreements; 3) Intel would punish any manufacturer who did offer AMD systems with exorbitant price hikes on the Intel parts they did sell; 4) All this was taking place during the Bush years of federal laissez-faire non-enforcement policy, giving Intel free rein on those practices; 5) Prejudice against AMD in the IT industry was widespread, and still is; 6) few people saw or acknowledged the need for a flat 64-bit address space; 7) those that did have the need for 64-bit software were forced to spend exorbitant amounts of money for RISC workstations, which motivated them to look down their nose at commodity PCs, even if they were 64-bit; 7) Chicken-and-Egg syndrome (no volume 64-bit hardware, thus no volume 64-bit software, thus no need for volume 64-bit hardware).

So AMD did not "short themselves on implementation". Their architecture was state of the art, and kicked both 32-bit Pentium and non-compatible IA-64 in the nuts. They had all of today's advanced hardware features years before Intel: x86-64 architecture; Hyper-transport to replace the front-side bus bottleneck and enable point-to-point CPU links; and on-board memory controllers. AMD was not able to block Intel from poaching their features because of the pre-existing patent cross-licensing agreements. And anti-monopoly enforcement was practically non-existent at the time (and not much better today).

Of course, not of this is meant to imply that AMD was not partially or even mostly responsible for their troubles. They were (and still are) horrible at executing their own roadmaps. They were (and still are) horrible at marketing to consumers. They were (and still are) horrible at manufacturer relations. They were (and still are) unable to make a sane strategic decision if their life depended on it. They were (and still are) perceived as the el-cheapo Intel-knockoff copycat instead of pioneering leaders in their field.

So yeah, AMD is a hot mess, but there is plenty of blame to go around.

--
"The only good windmill is a tilted windmill."

Re:Whatever! PowerPC been doing 64-bit by Guy+Harris · 2013-04-22 15:29 · Score: 2

POWER != PowerPC

Both POWER (all-caps) and PowerPC refer both to instruction set architectures and brand names used on processors that implemented them.

The PowerPC ISA took the POWER ISA, added some stuff such as general-register-based multiply and divide instructions, and removed a few instructions (and didn't add in the ones used in the POWER2 processor).

POWER3 was a 64-bit processor that implemented the union of 64-bit PowerPC and POWER; I don't know whether any subsequent POWERn processors implemented the POWER ISA-only instructions or just the current version of the PowerPC/Power (not all-caps) ISA.

Re:Did it really work? by Guy+Harris · 2013-04-22 15:59 · Score: 3, Informative

but PAE is nothing but a spec on top of the other mess of bullshit known as segmentation.

Actually, no, it's a mode that changes the page table format to allow larger physical addresses in page table entries. Nothing to do with segmentation.

Re:Did it really work? by Alioth · 2013-04-22 16:03 · Score: 2

Notwithstanding all of that, amd64 also has more registers, so there''s less having to move stuff to and from memory and you can make most function calls by passing parameters in registers instead of on the stack. amd64 provides a worthwhile increase in performance just due to having twice as many general purpose registers (actually, more than twice as many because there's only really 4 proper general purpose registers on 32 bit x86 - amd64 adds 8 more registers).

--
Oolite: Elite-like game. For Mac, Linux and Windows

Re:Did it really work? by Alioth · 2013-04-22 16:06 · Score: 4, Informative

x64 has twice as many registers. That alone means less having to move stuff in and out of memory, so that will improve the speed when compared to 32 bit applications. 32 bit x86 has only 4 truly general purpose registers. x64 adds another 8 64 bit registers.

--
Oolite: Elite-like game. For Mac, Linux and Windows

Re:Did it really work? by metrix007 · 2013-04-22 16:53 · Score: 2

Why would a 64 bit program be slower when modern processes are optimized for 64bit programs?

--
If you ignore ACs because they are anonymous - you're an idiot.

Re:Twice as big as it needs to be? by SEE · 2013-04-22 17:06 · Score: 4, Informative

it's an easy choice unless you absolutely need 16-bit support.

The annoying thing being that an x86-64 processor in long mode can, in fact, run 16-bit protected mode code (like essentially all actual Windows 3.x programs) with the same compatibility sub-mode that runs 32-bit code. It's merely that Microsoft decided they didn't want to bother supporting it.

That this can be done is easy enough to prove; take a Win16 app and run it in WINE on 64-bit Linux.

Re:Twice as big as it needs to be? by fnj · 2013-04-22 18:18 · Score: 2

You could both, I don't know, ACTUALLY FIND OUT the answer and present it. The truth is somewhere in between. Some sizes are the same, and some are larger. The first column is 32 bit gcc in current Arch; the second is 64 bit gcc in RHEL 6.4; both with default options. sizeof (char) 1 1 sizeof (short) 2 2 sizeof (int) 4 4 sizeof (long) 4 8 sizeof (long long) 8 8 sizeof (void *) 4 8 sizeof (size_t) 4 8 sizeof (float) 4 4 sizeof (double) 8 8 sizeof (long double) 12 16 Given that there are quite a few long's, size_t's and pointers in typical C code, the 64 bit code is indeed substantially larger.

Re:Did it really work? by zbobet2012 · 2013-04-22 19:16 · Score: 3, Insightful

It sounds like you where just talking to a very bad functional programmer. You also have the order completely backwards. ANSI Common Lisp was the first standardized OO language. But more importantly most "OO" concepts come from functional languages to start with.

Design patterns for the most part are actually adaptations of pre-existing functional concepts. For example Chain of Responsibility is really just a slightly simplified monad (input must equal output). The first Iterator pattern was (map fn list). Flyweight is a simplified form of Memoization.

Packages and namespaces also first appeared in many functional languages first. Encapsulation vai lexical closures has been around since Scheme was invented in the 70's. Lambda functions? Those little gems, making there way into every OOP language where invented with lisp.

You have missed the entire point though if you think OOP is about organizing you programs or something. OOP is largely about encapsulating moving parts into logical pieces. Functional code is largely about minimizing or removing "state" (aka moving parts) from your code. E.g. an input to a function should always give the same output. These concepts are not incompatible at all.

Coursera 64-bit course by F.+Lynx+Pardinus · 2013-04-22 22:19 · Score: 2

For anyone interested in learning more about x86-64, Coursera, in conjunction with UWashington, just started a "Hardware/Software Interface" course that focuses on 64-bit processors.

Re:How soon till we get 128-bit? by petermgreen · 2013-04-23 00:57 · Score: 3, Informative

A long time.

We don't even have true 64-bit x86-64 processors yet. While programmers are told to* treat pointers as 64-bit in the current implementation (reffered to as a "48-bit implementation" there are only 47 usable bits for user-mode pointers**. That is enough to map 128 terabytes to one process, afaict the most ram you can currently get in a PC architecture machine is 2 terabytes.

If we assume the largest available memory size doubles every 1.5 years and we want to be able to map all the memory to one process then we have 9 years until the current implementation is used up and another 24 years after that before a "full 64-bit" (with one bit used to distinguish between kernel and user mode) implementation is used up.

* Of course just because programmers are told to do something doesn't mean they will http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642750
** A 48th bit is used to differentiate kernel and user addresses. The number is then sign-extended to produce a 64-bit number.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register

Slashdot Mirror

64-bit x86 Computing Reaches 10th Anniversary

39 of 332 comments (clear)