64-bit x86 Computing Reaches 10th Anniversary
illiteratehack writes "10 years ago AMD released its first Opteron processor, the first 64-bit x86 processor. The firm's 64-bit 'extensions' allowed the chip to run existing 32-bit x86 code in a bid to avoid the problems faced by Intel's Itanium processor. However AMD suffered from a lack of native 64-bit software support, with Microsoft's Windows XP 64-bit edition severely hampering its adoption in the workstation market."
But it worked out in the end.
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
Erm that 'smart memory management' (PAE) has a nice big performance hit. Somewhat bigger than a 3% slowdown.
Also 64 bit can handle bigger numbers (over 4.3 billion) an awful lot faster than 32bit can. It doesn't help with small numbers but for the bigger ones 32bit processes them rather inefficiently.
My 32 GB of RAM, absolutely essential for my work, laughs at your "memory management" bullshit.
Life needs more saving throws.
I call it garage syndrome... The stuff you have expands to fill your garage, no matter the size. It does not need to be bigger, but programmers can get lazy too. Not to mention fitting in the latest new shiny.
If it's such a success, why does 64-bit software generally only run marginally faster than its 32-bit build? 64-bit binaries are larger and might run 103% at the speed of 32-bit if you're lucky.
Sure, it helps with the 4GB memory space limit, but so can smart memory management and other approaches.
I could see it being useful for super-computing things, but in general, there still just doesn't seem to be a point.
Wow, just wow. Do you actually work in the software field???
AMD may have helped create the x86-64 market, but now it's getting killed by it. soon Intel will be the only major player. ARM market is AMD's only hope.
No.
Those were x86-based? The title was "64-bit x86 Computing Reaches 10th Anniversary", not "64-bit Computing Reaches 10th Anniversary".
FC Closer
But it worked out in the end.
Yes, mostly due to the fact that we needed a way to get past the 4GB memory limitation, and not because we gave a damn about whether the processor was native x64 or not. AMD has had some great ideas, but they've almost always shorted themselves on the implimentation, leaving the field wide open for Intel to come in with a better offering and take the lion's share of the profit.
#fuckbeta #iamslashdot #dicemustdie
Does 64 bits really mean that every program is twice as big as it needs to be? Every time I hear about an innovation that requires things to be bigger, I question the necessity.
Nope. Doesn't mean that at all.
Maintaining backwards compatibility with 32-bit means that you have to compile it twice, and include both sets of binaries. Actual compiled code that doesn't bother with backwards compatibility isn't significantly larger than 32-bit code.
This!
For all the people who insist "Oh, but I need those big access and my umpteen gajillion giggerbytes of memory", well... no. No you don't, except for the rare computational simulation. What happened, is programmers who write the fancy software that you use decided that everything is easier if they allocate 100 times are much memory as they actually need. Computing has become 99% overhead and laziness and 1% actually doing something useful.
It's like seeing an empty 12 lane highway and deciding that you'd better widen your car until it fills 11 of them, because otherwise they'd be all wasted lanes. The car still travels the same speed and gets the same small payload to it's destination, of course.
do you? for average PC applications (browsing the web, e-mail, office documents) 64 bit gives no advantage. for the above-average applications (multimedia creation/editing, CADD, running multiple VMs, ) it's very helpful.
Heard x64 was barely faster than 32-bit, wrote this program to find duplicate files on Windows: http://poshcode.org/3377 - it's at least twice as fast in x64 than 32-bit. Naturally it won't apply to everything, but for certain things x64 is really good.
"...I think the Microsoft hatred is a disease." - Linus Torvalds
Depends on how it's coded, for example: 64 bit MAME runs around 30% faster than the 32 bit version: http://www.mameui.info/Bench.htm
MIPS and Alpha ask power pc to get off their lawn.
Do you even lift?
These aren't the 'roids you're looking for.
Most programs still don't need to work with numbers larger than 4 billion on a regular basis, so native 32-bit ints are just as fast as native 64-bit ones.
Most programs still don't need to map more than 2GB (not 4GB; in fact not even quite 2GB) at once, so there's no pressing need for 64-bit pointers.
Software does take advantage of the fact that you can fit twice as many 32-bit values into the standard x86 registers if the registers are 64 bits wide, in the same way that you can stuff two 16-bit ints into EAX on a 32-bit system if you want to. However, the performance gains from doing so end up in conflict with the reduced cache coherency of larger binaries (bigger instructions) and possibly larger (less well-packed) data, resulting in more frequent cache misses. That's why the perf gains are typically very modest, although it really depends on the application.
Where 64-bit does become really valuable is working with very, very large amounts of sequential data (want to allocate a 10GB array? Can't do that on x86, no way no how). That's hardly a typical requirement right now (although I wrote a program a few weeks ago that needed to do it). However, it's getting closer. Additionally, while clever memory mapping can allow a 32-bit process to access over 4GB of RAM (just not all at the same time), there is a (small) performance impact associated with the need to be constantly re-mapping that memory.
The other area where 64-bit really helps is with security, specifically exploit mitigation. High-entropy ASLR in recent versions of Windows and some other OSes randomly places 64-bit aware executables and their various data regions across their entire 64-bit address space. This not only makes it completely impossible to correctly guess the address of any given bit of code in memory, it also makes spraying (heap spray, JIT spray, etc.) attacks completely infeasible; to cover even a tenth of a percent of the address space, you'd need to spray 16 million gigabytes of data. That's not only quite impractical at modern CPU speeds (even on a blazingly fast CPU and done in parallel, it would take a week or more), it also is far more memory (physical or virtual) than any modern computer will be able to allocate.
There's no place I could be, since I've found Serenity...
Depends on how you want to look at it, and who you feel like being cynical against. Easing the job of programmers is a good thing, if they can use 10x more ram and not have to write code to juggle memory as much, they have eliminated a potential source of bugs and a time sink, that is probably hard to maintain as well. Memory is cheap, I got 16GBs for $90 bucks, and though programs are larger, maybe unnecessarily so, nothing comes close to exhausting my memory. It seems like a much better method, than defining some arbitrary limit, stopping all progress, and telling programmers to 'stop being lazy sobs'.
"...I think the Microsoft hatred is a disease." - Linus Torvalds
I've seen Firefox run into the 2GB user-mode address space / process limit many times... Chrome and (recent) IE don't have this problem due to per-tab processes, but Firefox definitely hits it when you use as many tabs as I do.
There's no place I could be, since I've found Serenity...
Nah - your primitives are doubled in size, which realistically represents something closer to a 25-33% size increase on average. But between the abilities to manipulate MUCH larger quantities of data at once and addressing >3.5 GB RAM, it's an easy choice unless you absolutely need 16-bit support.
Look, I like bashing straw man lazy programmers as much as anybody. But in scientific computing in the year 2013 - say, where you need to store 50 cubic miles of subsurface 4-dimensional seismic reflector data for 3D visualization and modeling density change over time - you run into the limits of 4 gigabytes very quickly. Never mind large-scale simulations run in TOUGH2-MP... Don't paint with such a large brush. People may piss memory on stuff that ran in less RAM back in 1996, but we're not there any more, and adventurous, relevant, and efficient uses of RAM really do exist.
Most programs still don't need to work with numbers larger than 4 billion on a regular basis, so native 32-bit ints are just as fast as native 64-bit ones.
Most programs still don't need to map more than 2GB (not 4GB; in fact not even quite 2GB) at once, so there's no pressing need for 64-bit pointers.
Most programs don't need more than one floating point pipeline.
Most programs don't need lots of cache.
The pros outweigh the cons.
Not if by node you mean NUMA node.
do you? for average PC applications (browsing the web, e-mail, office documents) 64 bit gives no advantage. for the above-average applications (multimedia creation/editing, CADD, running multiple VMs, ) it's very helpful.
1) Yes, I do.
2) You are so wrong that it's actually funny.
XP x64, Microsofts ginger step-son of an OS. Ignored and dropped like a hot potato as soon as they could.
You couldn't get drivers for half the stuff, even MS didn't provide their own software and lots of 'free for home, pay for commercial' stuff would detect it as 2003 Server and refuse to run/install.
Somewhat of a shame really as it wasn't a bad OS.
So those 32 extra bits of memory addressing are nice. But don't forget about that 1 extra bit for identifying registers!
Those who fail to understand communication protocols, are doomed to repeat them over port 80.
Only if it's a fat binary, but thankfully these never needed to catch on with the x86 to x86-64 transition.
/* No Comment */
with 32 bit on some system you get like 2.5-3.7gb useable ram. and yes video ram eats from the 4gb pool.
A 32-bit x86 app has access to 8 32-bit "general purpose" registers - they ain't really all general purpose because three of them are the stack pointer, frame pointer, and program counter.
A 64-bit x86 app has access to 16 64-bit "general purpose" registers. Optimize away the use of the frame pointer (if you can), and your app goes from 5 32-bit registers to 14 64-bit registers.
Of course, when you wrote your app you didn't do stupid brain-dead shit like "gee, size_t is really an unsigned int, so I'll use that to hold this pointer value", now did you?
And for those that want the best of both worlds, there is the x32 ABI, which uses all the good stuff from x86-64 (more registers, better floating-point performance, faster position-independent code shared libraries, function parameters passed via registers, faster syscall instruction... ) while using 32-bit pointers and thus avoiding the overhead of 64-bit pointers.
They're working on porting Linux to the new ABI...kernel and compiler support is there, not sure about all the userspace stuff.
Most of those same home users might get by with 512 - 1GB RAM and a 1$10 AGP video card; but with millions having multigigabyte machines with vector processor GPUs, the potential for cheap, powerful distributed processing is enormous - if you can convince them to give up a few hours of CPU time occasionally.
Otherwise, well, it's probably just a waste electricity although PCs have been pretty darned efficient in the last few years.
Pain is merely failure leaving the body
but his ability to do so might be hampered if the hardware wasn't in general use.
with 32 bit on some system you get like 2.5-3.7gb useable ram. and yes video ram eats from the 4gb pool.
I'm not sure if you're agreeing with the parent or disagreeing. Per-process you get less than 2 GB, for the system as a whole you get somewhat less than 4GB (depending on how much the system has mapped to something else). PAE can hack around the 4GB system limit somewhat.
He IS too stupid to be real.
Define "widely used", in engineering circles plenty people ran XP64... roughly the same crowd that ran win2k before.
It was also misnamed, it's Windows 2003 x64 workstation.
And that gets us to driver support... use 2k3 x64 drivers. They work (surprise, it's the bloody same kernel).
Uh, no. He never said anything like that. But hey, don't let the facts stop you... just keep repeating that retarded meme.
The program is written in C#. Only MS knows what is going on there.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
I just want all of you to know that you're making this thread very hard for me to score cause I'm not sure who's exactly right and who isn't. that is all. you may carry on now.
In my day, a Beowulf cluster had 128MB ram per node.
Uphill. Both ways. In the snow.
I am very small, utmostly microscopic.
Snow?
My first computer required that you toggle in the boot loader binary code from front panel switches!
That has to be the modern equivalent of hand crank started horseless carriages.
In our algorithms lab there were programs that would gain more than 2x when compiled for 64 bit.
A more "real-world" example is when I started in 2005 at my current company. The engineers had 6-month old P4s @ 3.2 or 3.4GHz, running 32bit linux. For a project they used VisualStudio on VMWare and it took over a minute to compile the project. The company allowed engineers to choose their hardware, so I built an Athlon 64 @ 2.2 or 2.4GHz and I had it run 64bit SuSE. I remember the shock and awe from the first time I tried to compile the project under VMWare - a little more than 10 secs - the engineer next to me had his jaw drop. Of course most of the engineers immediately requested to switch to 64bit machines. I am not sure why it made such a difference in that application - perhaps the 16 general purpose registers come in really handy in this scenario? Of course it didn't help that the P4 was slower in everything (funny how at the time very few reviews really clarified this), but not order of magnitude slower...
Violence is the last refuge of the incompetent. Polar Scope Align for iOS
I never knew it was suposta be faster
SPARC would like a word with you as well. When the Ultra workstations first hit the market, 32 bit software actually ran slower under 64 bit Solaris.
Only the State obtains its revenue by coercion. - Murray Rothbard
That's why I switched to using 1 bit microprocessors. My programs are really small now. I just wrote a database which I can fit in my pocket.
do you? for average PC applications (browsing the web, e-mail, office documents) 64 bit gives no advantage. for the above-average applications (multimedia creation/editing, CADD, running multiple VMs, ) it's very helpful.
On Debian Linux and I can peg with Flash a stupid Zynga game running past 3GB of RAM. For Multimedia Creation/Editing you bet your sweet ass 64 Bits matters. Then again Linux doesn't have shit like GCD and quality OpenCL built in the OS with app suites that can leverage both and welcome 32/64 GB of RAM with open arms. Quality drivers, quality OpenCL/OpenGL etc., are coming with all the hard work at LLVM/Clang, Mesa and more. When that shit lands you better believe 64 bit matters and any heavy engineering/scientific computing, to Blender Modeling/Rendering damn well loves it. So does GIMP.
I think if you understand how truly horrifying PAE is, you would have no doubt at all that 64 bit platforms were the way to go. There's a lot of memory management cruft in the Linux kernel that x86_64 eliminates.
x86_64 also slipped in a few much needed enhancements to the ia32 architecture, including some extra general purpose registers.
http://en.wikipedia.org/wiki/X86-64
I am constantly hitting the 3.3 MB RAM cap on my 32 bit machine at work just having the applications I need to do my job open at the same time. Combined with the fact that the hard drive is fully encrypted makes using it for swap space extremely expensive. I would kill someone for a 64 bit machine at times just for the increased RAM space.
Yeah, he's trolling in real life. This is the email I sent him to refute his garbage. No response. Imagine that.
Are you really serious about having 650 thousand lines in your hosts file? I can't imagine why you'd need that many. It also has a crippling effect on one's computer.
To test this, I created a sample copy of a hosts file with that many entries, using the "0" shorthand for IP address and a randomized hostname of average 32 characters. Total size of this file is 22855 kilobytes, and after an hour the DNS cache had only loaded a third of it in. This is primarily due to the choice of algorithm used by the DNS cache service - it wasn't designed for tens of thousands of hosts file entries to be stored, so uses a rather inefficient method of growing the space used to store that involves copying huge swathes of data around for each new entry. It also blocked any name lookups while loading the file.
So instead of this, I tried with only 65k entries, and made three copies of this file. Each had an identical list of hostnames, but used "0", "0.0.0.0" and "127.0.0.1" respectively. The DNS cache now took 1 minute 55 seconds to load each one; the choice of IP address style didn't make any difference to the loading time as the bulk of the processing was in inserting new entries as described in the paragraph above. Name resolution was at normal speed after that, though. Searching in-cache - even for such a large set of data - added no discernible penalty.
I decided to try with the DNS cache disabled. This isn't a good idea, as it forces uncached name resolution to be done for every single lookup. This is indeed what it did, and the original 650,000 entry hosts files added around 3 seconds onto every single name lookup, the amalgamated effect of which slowed general Internet access down considerably. Unlike the DNS cache loading, this time there was a slight difference in loading times between the different hosts files - this was expected, as it was reading the entire file each time so that became the bottleneck.
Finally, to address your last question: every IPv4 address is sorted in the cache using the same size of four bytes. e.g. both "0" and "0.0.0.0" become 00 00 00 00, both "127.1" and "127.0.0.1" become 7F 00 00 01, and so on. This is consistent with the binary format used in the sockets API.
In conclusion, using the hosts file to store tens of thousands of entries has a negative effect on the performance of Windows' name resolution. You should really consider another option to filter all those hostnames.
An awful lot of people run 10 year old computers, and also an awful lot of people run XP on computers that could handle 7 64bit or linux 64bits. So you'd better have a 32bit version of your program (Google Chrome, Google Earth, Firefox, whatever).
Though, it ought to be easier to have a fully 64bit system (a linux distro without Wine might do it, if you're careful to not install 32bit software and if Chromium and/or Firefox are 64bit there. But the benefits is only not storing and running duplicate 32bit libraries)
Software does take advantage of the fact that you can fit twice as many 32-bit values into the standard x86 registers if the registers are 64 bits wide, in the same way that you can stuff two 16-bit ints into EAX on a 32-bit system if you want to. However, the performance gains from doing so end up in conflict with the reduced cache coherency of larger binaries (bigger instructions) and possibly larger (less well-packed) data, resulting in more frequent cache misses. That's why the perf gains are typically very modest, although it really depends on the application.
You're arguing on the correct side, but what you wrote here is badly flawed. Packing multiple 32-bit values into a 64-bit register is near worthless, what is valuable is amd64 gives you twice as many general-purpose registers (that also happen to be 64-bits wide). A far bigger gain for 64-bit on x86 was the addition of full relative addressing. Instead of 32-bit jumps always being to absolute addresses, in 64-bit mode software can do addressing relative to the program counter. This helps a great deal with libraries, since instead of needing large relocation tables, they simply use relative jumps that are valid no matter what address the library is loaded at. With most processors using 64-bit mode loses performance due to having to shuffle more data around, x86 is about the only one that gains performance.
In my experience most hardware that works with other versions of x64 windows works fine on XP x64. The only two exceptions I ran into was the data translation DT9816 (which worked with some APIs but not others, go figure) and the NI mydaq (for which the software refused to install at all). Remember from a driver point of view XP x64 is basically the same as server 2003 x64 so all the core hardware that is used in both clients and servers is well supported.
As for adoption I know of a few dedicated simulation/number crunching boxes at university running it but I don't know anyone else who uses it as the OS on their main office desktop.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Nah - your primitives are doubled in size, which realistically represents something closer to a 25-33% size increase on average.
What makes you think your 'primitives' are doubled in size?
If the OP compares each file with every file, that would be CPU bound. With a well chosen hash table it shouldn't be.
Is this still Slashdot? Nobody mentioned that Linux supported x86 64 in 2001, before it was even released, while Windows was stuck at 32 bit for another four years.
PAE is more or less old school segmentation. You can't say 'it has a 3% slow down' because it has 0 slowdown if that particular page is already in memory, and if not ... it has the same 'slowdown' as an other paging operation plus a fixed number of cycles. So if you're dealing with tiny amounts of 'more than 2/3gb' then the overhead is a lot higher than if you're mapping out 2GB on every window change. PAE is just another form of paging. It is slower, but you're making numbers up from nothingness.
The interger math performance of the processer has nothing to do with it being 64 bit. Most (All now?) x86-64 processors internally will process 2 32 bit numbers in the same span as a 64 bit number if properly optimized by sending the 32 bit values through together. 64 bit code using less than the OS max for 32 bit code is actually slower than 32 bit code due to the increased pointer sizes wasting the processors registers filling them with 0s.
You really have no idea how processors work. While nothing you said is illogical, it is still in fact wrong in every account. Under the hood, processors don't work anything like they do on the surface.
Other processors also do other weird things. I have an 8 bit CPU that can handle 32 bit numbers in a single clock cycle, exactly like it does 8 bit numbers ... and the neat thing ... it can do 2 16 bit numbers in a single clock cycle! Why? Because the processor as I see it from a software developers perspective isn't anything like the actual hardware doing the work. Processors have translation units in front of them to provide you with one look while allowing themselves to rewire the backend in all sorts of different ways.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Most programs still don't need to map more than 2GB (not 4GB; in fact not even quite 2GB) at once, so there's no pressing need for 64-bit pointers.
Perhaps not most, but a whole awful lot of programs want more than that. I'd say the mean for "large" apps on my laptop is 1.5GB, and the resident size distribution is (to my eye) more or less gaussian. That means that few apps want more than 2GB today, but if the average app grew by 33%, about half of them would be over the 31-bit size limit.
Dewey, what part of this looks like authorities should be involved?
Snow?
My first computer required that you toggle in the boot loader binary code from front panel switches!
That has to be the modern equivalent of hand crank started horseless carriages.
Takes me back to loading those Interdata model 3s with the front buttons so we could load the paper tape. Then we could watch the registers with lights on the front as our code executed. Ah glad those days are over.
Then you did something wrong.
There is no logical reason that an x86-64 procressor in 64 bit mode would perform faster than 32 bit mode unless you are memory constrained. Raw operations are not inherently faster in 64 bit mode than they are in 32 bit mode.
If you are not exceeding 32 bit memory limits, your 64 bit version SHOULD be a tiny little bit slower than the 32 bit version.
Let me guess, you ran it in 32 bit mode, then ran it again immediately after in 64 bit mode ... and then ignored the disk cache completely?
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Really? PAE is bad? Have you just learned to completely ignore segmentation unless its named PAE?
Segmentation on x86 is utter tripe as well, but PAE is nothing but a spec on top of the other mess of bullshit known as segmentation.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Sure, and heaven forbid any of those "users buying commodity computers with 64-bit CPUs and OSes" could ever use their hardware to begin tinkering with high-resolution video editing, or programming, or playing with the full capabilities of the hardware with which Moore's Law has blessed them. The fact that I do real work that can use more than 4 GB RAM doesn't mean the average user of whom you clearly think very little is incapable of doing so.
Its not cheap you jackass, you're just passing the bill of to someone else.
On top of that, its incredibly shitty for the environment.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
The fact that they are, maybe? ints become twice as big
No they don't, the size of an int is entirely compiler dependent.
My first point was that PAE does have a overhead somewhat larger than the 3% the parent mentioned.
And that overhead increases with the amount of ram you have. Sure 32gig of ram has very little overhead with PAE. That is of course unless you actually use the 32gig of ram and then it will be constantly swapping memory pages around.
Yes I know most people don't use that much RAM. My point is still valid.
Also my 2nd point was that 64bit processors handle big numbers faster, not small numbers slower.
Yes different architectures can behave differently. We are talking about x86 though.
Fact: 64bit x86 will process a 64bit number faster than 32bit x86.
By your blinkered "thinking", all research that doesn't produce instant results is wasted.
Pain is merely failure leaving the body
POWER != PowerPC
Both POWER (all-caps) and PowerPC refer both to instruction set architectures and brand names used on processors that implemented them.
The PowerPC ISA took the POWER ISA, added some stuff such as general-register-based multiply and divide instructions, and removed a few instructions (and didn't add in the ones used in the POWER2 processor).
POWER3 was a 64-bit processor that implemented the union of 64-bit PowerPC and POWER; I don't know whether any subsequent POWERn processors implemented the POWER ISA-only instructions or just the current version of the PowerPC/Power (not all-caps) ISA.
My experience with moving applications to 64-bit that didn't need the massive single memory space was that I started paging a lot more, since they were allocating words twice as wide (and while I could address every molecule in the computer separately, the same number of them were still memory). Physical memories have since expanded to compensate, but I'd like to see some statistics on the entropy of the upper 32 bits of the average QWORD.
"Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
The fact that they are, maybe? ints become twice as big,
If you're talking about C-language ints, on very few 64-bit platforms are they 64-bit. Most UN*Xes are LP64, not ILP64, and Windows is LLP64 (they didn't even make long 64-bit, unlike most UN*Xes).
so do pointers.
Only if it's a fat binary, but thankfully these never needed to catch on with the x86 to x86-64 transition.
...although they did anyway, in OS X (even though the vast majority of Macs had x86-64 processors).
I know
If you know then why did you say ints become twice as big when they don't?
I was giving an example from the real world on that platform that people of Slashdot are usually interested about, Linux.
That example doesn't in any way illustrate that primitive types would be twice as big, pointers yes, but not primitive types.
but PAE is nothing but a spec on top of the other mess of bullshit known as segmentation.
Actually, no, it's a mode that changes the page table format to allow larger physical addresses in page table entries. Nothing to do with segmentation.
Notwithstanding all of that, amd64 also has more registers, so there''s less having to move stuff to and from memory and you can make most function calls by passing parameters in registers instead of on the stack. amd64 provides a worthwhile increase in performance just due to having twice as many general purpose registers (actually, more than twice as many because there's only really 4 proper general purpose registers on 32 bit x86 - amd64 adds 8 more registers).
Oolite: Elite-like game. For Mac, Linux and Windows
64-bit binaries are larger and might run 103% at the speed of 32-bit if you're lucky.
Maybe there is a lot of software written in C that uses int or unsigned when it should have typedef'd a size appropriate for its needs.
Software that's written in C, in all of the environments I know of for x86, has 32-bit ints (signed or unsigned) whether compiled 32-bit or 64-bit, so you're presumably not saying that those programs suddenly get 64-bit ints when compiled 64-bit. They will get 64-bit longs on UN*X (but not on Windows), and will get 64-bit pointers in either case.
x64 has twice as many registers. That alone means less having to move stuff in and out of memory, so that will improve the speed when compared to 32 bit applications. 32 bit x86 has only 4 truly general purpose registers. x64 adds another 8 64 bit registers.
Oolite: Elite-like game. For Mac, Linux and Windows
PAE is more or less old school segmentation.
PAE isn't segmentation at all. It's a mode that changes the page table entry format to support more physical memory. Maybe you're thinking of something else as being "PAE", but Intel's (and AMD's and...) idea of PAE is a Physical Address Extension.
I frequently have to run a VM on my laptop. When I had 32-bit Win7 running with 4 gb of RAM it was painfully slow, 8-10 minutes between boot and getting a login. After reinstalling with 64-bit Win7, exact same hardware, same VM, boot time went down to 3-4 minutes. With 8 gb of RAM boot time was around 2 minutes, not even enough time to go get a cup of coffee.
"Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
I suppose you consider running virtual machines to be a "rare computational situation". Try running a couple of VMs under your 32-bit OS, and you'll change your tune pretty quickly.
"Think about how stupid the average person is. Now, realise that half of them are dumber than that." - George Carlin
Why would a 64 bit program be slower when modern processes are optimized for 64bit programs?
If you ignore ACs because they are anonymous - you're an idiot.
it's an easy choice unless you absolutely need 16-bit support.
The annoying thing being that an x86-64 processor in long mode can, in fact, run 16-bit protected mode code (like essentially all actual Windows 3.x programs) with the same compatibility sub-mode that runs 32-bit code. It's merely that Microsoft decided they didn't want to bother supporting it.
That this can be done is easy enough to prove; take a Win16 app and run it in WINE on 64-bit Linux.
And how many people actually owned a SPARC, POWER, or Itanic for that matter? Then it really doesn't matter does it as we can play this game all damned day with chips.
What matters is not only did AMD bring 64bit to the masses, thus paving the way for having large amounts of RAM without bad hacks like PAE or using RAMdisks but they also paved the way for making large RAM sticks for the masses which is why so many of us have oodles of RAM,hell even my netbook has 8GB of RAM which before 64bit went mainstream was unheard of outside the enterprise.
So lets hear it for AMD, if it weren't for them you'd be stuck on the Itanic.
ACs don't waste your time replying, your posts are never seen by me.
No Linux fanboys on Slashdot today? That's hard to believe. That's about as likely as not having anyone who just can't admit, after all these years of fail, that their "team", Microsoft, sucks horribly. So bad that not Only is Apple spanking them in sales, but a bunch a greasy, toenail-fungus-eating hippies programming in their spare time kicked the crap out of Microsoft for servers, embedded systems, and just about anything that's not attchached to a 19" CRT.
32 GB Ram High-Five! Seriously, anytime Asus is feeling poor, they can release a Crosshair motherboard that takes 64 GB or perhaps 128 GB of RAM.
I am not through upgrading until I can virtualize the speed and location of every particle in the universe. Then I'm going to see what exactly this Time dimension actually looks like from a different angle. Maybe. I have a few other ideas, but I probably won't be allowed near a computer this powerful if I announce them all at once. =^_^=
I am John Hurt.
He's never written anything that's tested the limits of computing...
Meanwhile, I need only load up my badly coded evolutionary program to see my machine scream at the ~12 GB hit to the RAM. I say badly coded because I have found a few tricks to help get some additional memory savings out of it...also on topic, the aggression level was kind of low, so I imagine future tests might break the 32 GB barrier easily. Currently thinking of giving it a SSD for virtual memory...
I am John Hurt.
Thank you. The people spouting nonsense about 32-bit programming, and how they can't understand why 64-bit computing would be faster (in the x86 world) drive me loony...it's like they missed an entire year's worth of classes where we went over, in detail, the various changes, and why it's faster...and they have the gall to ask for your notebook the night before the final. I mean, it's impressive, that kind of blindness, but they're aren't getting the notebook without a pimp slap to go with it (extra baby powder).
It's kind of like watching the functional programming people slowly reinvent OOP...makes me scream inside. "Dude, we've figured out a new way to organize our methods / fields so that it's easier to keep them straight in our heads..." "Please God, let it not be OOP." "*talks for a bit*" "Damn it."
I am John Hurt.
Most programs don't need a GUI...but they tend to function better with one. Most computers don't need a SSD...but they tend to run faster with one, and users tend to agree that you can have your SSD back when you pry it from their cold dead fingers.
You don't have to fly First Class, you're getting there at the same time as the people in Business or Economy class...but it's a lot nicer.
I am John Hurt.
No, but they made it a hell of a lot cheaper for those of us that do. Now there's machines with 64 real cores and 128GB of memory for less than $10k. While GPUs are really nice for some stuff they can't handle much memory, so real CPUs still have a place.
Hmm. Depends. The global economy is a bit too unstable to make much progress for now, and people are still getting used to the 64-bit changeover.
We have multiple cores, but the software kits haven't evolved enough yet to take full advantage of them, or so I'm told.
Personally, I think the next big leap should be optical processing.
I am John Hurt.
And how many people actually owned a SPARC, POWER, or Itanic for that matter?
Well, some of the masses might have had G5 iMacs (PowerPC 970, 64-bit), but, yes, it took AMD to bring 64-bit to most of the masses.
At least one comment claims that the original title of the article was "64-bit Computing Reaches 10th Anniversary", which, if true, means the article came out with a bogus headline (there's more to "Computing" than stuff that runs on a mainstream desktop or laptop machine, and DEC OSF/1 came out in 1993, so it's been at least 20 years); if the original comment was posted before that, I can see his complaint (and the complaint of the person who pointed out that the MIPS R4000 came out before the first 64-bit PowerPC processor). Complaining about "64-bit x86 Computing Reaches 10th Anniversary" neglecting other 64-bit architectures, however, is silly.
You could both, I don't know, ACTUALLY FIND OUT the answer and present it. The truth is somewhere in between. Some sizes are the same, and some are larger. The first column is 32 bit gcc in current Arch; the second is 64 bit gcc in RHEL 6.4; both with default options.
sizeof (char) 1 1
sizeof (short) 2 2
sizeof (int) 4 4
sizeof (long) 4 8
sizeof (long long) 8 8
sizeof (void *) 4 8
sizeof (size_t) 4 8
sizeof (float) 4 4
sizeof (double) 8 8
sizeof (long double) 12 16
Given that there are quite a few long's, size_t's and pointers in typical C code, the 64 bit code is indeed substantially larger.
You could both, I don't know, ACTUALLY FIND OUT the answer and present it.
I did, the answer is that it is dependent on the implementation, not the machine architecture. Or are you going to tell me that those are the size of those primitive types on 32bit and 64bit architecture? Because they aren't, they are just the values defined by the implementation you used.
The truth is somewhere in between.
No, the truth is exactly as I said, that - as your post demonstrates - primitives are not doubled in size on 64bit architecture, and the reason why is because the decision is on the size of primitives is not governed by the underlying architecture.
If you look at the C standard for example you will find those values are not defined by the standard, the C99 standard only defines a minimum precision, the actual size is up to the implementation - which again is nothing to do with the underlying architecture.
Did ints become twice as big? No. Could they? Of course. Why? Because it is defined by the implementation, not the machine architecture, it's all right there in the specification.
It sounds like you where just talking to a very bad functional programmer. You also have the order completely backwards. ANSI Common Lisp was the first standardized OO language. But more importantly most "OO" concepts come from functional languages to start with.
Design patterns for the most part are actually adaptations of pre-existing functional concepts. For example Chain of Responsibility is really just a slightly simplified monad (input must equal output). The first Iterator pattern was (map fn list). Flyweight is a simplified form of Memoization.
Packages and namespaces also first appeared in many functional languages first. Encapsulation vai lexical closures has been around since Scheme was invented in the 70's. Lambda functions? Those little gems, making there way into every OOP language where invented with lisp.
You have missed the entire point though if you think OOP is about organizing you programs or something. OOP is largely about encapsulating moving parts into logical pieces. Functional code is largely about minimizing or removing "state" (aka moving parts) from your code. E.g. an input to a function should always give the same output. These concepts are not incompatible at all.
This!
No, not this.
It's just a thinly disguised "the youth of today" argument.
Just stop and think for a minute about how easy it is to eat up 2G.
It's one 40000x40000 image. In other words juuust a bit larger than the maximum practical X11 framebuffer size. Plenty of sources spit put images larger than that now.
It's about 1000 1080p images, which is about 30 seconds of uncompressed video. I'm sure if you're editing streams together, it won't matter haveing a total of 30 seconds of video cached. Won't be annoying at all.
Now go and apply those to games. If you want to avoid slow loading from disk, you need to cache all those assets.
And as for "all programmers now are crap" it's just rot. The Mel's of this world have only ever been 0.01%. Back when C was in vogue, compilers were bad and produced slow code, few people knew how to optimize by hand and memory leaks were rampant.
You know what? I like having lots of memory and a fast CPU. It means several things. When I just need a one-off thing done, I can do it in an appropriate language. When I want speed, I can write C++ and it will go blindingly fast. And it's simpler too since I don't have to muck around with complex caching or overlay schemes or any of that crap.
Or do you believe that 640k really is enough for anyone?
SJW n. One who posts facts.
PAE is more or less old school segmentation.
PAE isn't segmentation at all.
I suppose it depends on how you look at it. If you view the page directory as a bank select then it is a sort of segmentation.
PC-relative addressing makes position-independent code significantly faster. This is useful for shared libraries, but also for position-independent executables which, in combination with address space randomisation, add some security.
SSE is guaranteed to exist. This alone accounts for most of the speedup, because compiling for x87 is really hard (crazy hybrid of a stack- and a register-based architecture), so generating SSE ops for floating point, even if you're only doing scalar arithmetic, is a lot more efficient.
More GPRs. x86-32 code ends up with a lot of stack spills because it only has a tiny number of general-purpose registers. x86-64 has 16, which makes it a lot easier to work with.
64-bit registers. On x86-32, 64-bit arithmetic is painful, because you need two registers for each of the operands, and you only have 6 registers to use (two of which must be used for the destination in a lot of ops). On x86-64, it's a lot easier to do sequences of 64-bit arithmetic without spills.
I am TheRaven on Soylent News
There is no logical reason that an x86-64 procressor in 64 bit mode would perform faster than 32 bit mode unless you are memory constrained
Or you benefit from more registers. Or you benefit from vastly more 64-bit registers. Or you're doing floating point and benefit from the compiler being able to assume SSE is present and never use x87 arithmetic. Or you're using shared libraries so benefit form faster position-independent code. But, apart from that, no logical reason at all...
I am TheRaven on Soylent News
+0, pedantic. Everybody knows all that. Still, the FACT is that with the C compiler used for most open source compiling, 64 bit code is bigger in size, because some of the variables are bigger and none of them are smaller. Are or are not long and pointer both twice as big in 64 bit? Never mind "well it doesn't have to be" and "it's just a chocie" and "has nothing to do with number of bits in the CPU".
All the patents required to implement MIPS IV (64-bit) have expired. The ones on SPARCv9 expire this year. Alpha expired last year. There's a reason for caring about the 20-year mark: it makes implementing the architecture a lot safer. We have a research processor that implements the MIPS IV instruction set for this exact reason: we may accidentally infringe some patents, but it is definitely possible to work around them by implementing things however the R4K did.
I am TheRaven on Soylent News
Windows is LLP64 (they didn't even make long 64-bit, unlike most UN*Xes)
And this caused a lot of pain because a huge number of programmers believed (some still do) that the C standard guaranteed that sizeof(long) >= sizeof(void*) and so used long instead of intptr_t. They did this because a lot of their headers used packed structures for things like file headers and used a type that was typedef'd to long, and it was easier for them than fixing all of their headers.
I am TheRaven on Soylent News
Then this raving nutcase / troll can post his mind spool as much as he likes but its impact will be minimal.
So good in that AMD got the contract. It is money, no question about it, and the console market is not small. Better (for them) they should have it than IBM or someone.
So how is it bad? Low, low margins.
Consoles are very cost driven devices. Often sold at a loss initially, and then little to no profit later. The reason is they want to pack as much hardware as they can in for as cheap as they can. Well the other side of that is they lean on suppliers, hard, to offer very low prices. They don't give their suppliers a lot of profit. They don't force them to take a loss or anything (the suppliers wouldn't agree) but it is just this side of it.
So selling 50 million units for consoles is way less profitable than selling 50 million units for laptops, desktops, servers, that kind of thing.
Hence while it is better than having no sales at all, it is not as good as taking a bigger slice of the computer market.
Simula 67 was standardised in 1968. ANSI Common Lisp dates from 1984, and the OO implementation it includes (CLOS) was a relatively recent development at the time. CLOS is also a hack, although Lisp bores try and pretend it isn't by claiming the omissions make it "more poowerful".
I agree that 64-bit machines are somewhat niche, but I work in that niche.
If you do anything serious with Java, on Windows, because of the memory layout and the insistence of the HotSpot VM on being allocated contiguous stretches of address space, you're limited to about 1.2GB of heap space. When you have a domain that has object counts in the 3 - 5 million region, that fills up rapidly. This is for a big graph of objects and the queries for them involve lots of graph traversal. The code in question can do set queries in about 0.5s that an RDBMS takes over 5 mins to do, so there's a real value to caching all the objects on the heap.
Yes, I could use another language that doesn't have a stupid VM and have ample overhead in 4GB, although this data set will grow (even if it's not "social network" level of growth). But with working code in Java, it's much cheaper and easier to throw a 64-bit OS and another stick of RAM at it.
A shame that my employer is still tragically stuck in the 90s and thinks 32 bits should be enough for anyone..
He's de-duping files with SHA512, from the listing.
That will get a major boost on 64-bit machines just because of the increased word width. I imagine the hashing step is what is consuming most of the CPU time, and making the code CPU bound instead of I/O bound.
And PA-RISC hits the 20th anniversay for the 64 bit version on three years time. Was always intrigued by that architecture, but only got to play very briefly with a HP9000 server.
who exactly does that? using 8 bytes when you only need 4 is just stupid.
It's more complicated that than.
CPUs move memory around in register-sized chunks ("words"). Therefore a CPU operating in 64-bit mode moves memory around in 64-bit sized words.
You can gain some ground by packing smaller variables together, but there will be some slack for things that don't fit into the chunk size. And it's more efficient to access memory aligned to word boundaries.
You may as well say "why use 8 bits when you only need one" - most databases store boolean values as a whole byte, because it's a total pain in the arse to write a single bit then offset the rest of the row by one bit to save 7 bits of space. If you have multiple boolean fields (up to 8 per byte), they get packed together, because it's much cheaper to shift a single byte to the left than it is to shift the rest of the row.
So the answer is, everyone does that, because their compiler takes care of it for them.
64 bitness was never about performance. It was always about larger address space. The fact that in some cases there is a performance increase is just a bonus.
It's kind of like watching the functional programming people slowly reinvent OOP...makes me scream inside. "Dude, we've figured out a new way to organize our methods / fields so that it's easier to keep them straight in our heads..." "Please God, let it not be OOP." "*talks for a bit*" "Damn it."
I find it quite funny when the OO-crowd goes off like this :-)
(In case no one else clues you in: you've got it backward - Functional came first, and gave the world OO. OO now constantly reinvents everything that lisp had, under the guise of "new and improved")
I'm a minority race. Save your vitriol for white people.
For anyone interested in learning more about x86-64, Coursera, in conjunction with UWashington, just started a "Hardware/Software Interface" course that focuses on 64-bit processors.
You had it good. Those clay tablets were a bitch to load,
I am very small, utmostly microscopic.
Seriously, anytime Asus is feeling poor, they can release a Crosshair motherboard that takes 64 GB or perhaps 128 GB of RAM.
How much ram we can put in our desktops is not really up to motherboard manufacturers like ASUS, it's up to the CPU and RAM manufacturerers.
Current intel mainstream desktop CPUs support four DIMMs and current high end high end desktop CPUs support eight DIMMS. Afaict the largest DIMM of desktop memory* currently available is 8GB. So the current limit is 32GB for mainstream desktop and 64GB for high end desktop. I belive that the high end desktop stuff theoretically supports 128GB but noone makes the DIMMs needed to do it yet.
Workstation/server platforms can take a lot more than that both through supporting more DIMMs and through supporting types of DIMM that come in higher capacities. I've seen systems that claim support for up to 2TB of ram.
* DDR3, unregistered non-ecc.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Everybody knows all that.
Obviously not, i wouldn't have even been having the discussion otherwise, the reply was directed squarely at the fact that primitives - specifically integers - do not double in size on 64bit architecture. Go back and read the thread, had you done that in the first place you would know that.
Still, the FACT is that with the C compiler used for most open source compiling, 64 bit code is bigger in size
I never disputed that, again read the thread before you reply so you know the context of the discussion before you interject with an irrelevant comment.
Are or are not long and pointer both twice as big in 64 bit?
They are in that instance, which is fine and was never in dispute, why do you think that was in dispute?
Never mind "well it doesn't have to be" and "it's just a chocie" and "has nothing to do with number of bits in the CPU".
That is the topic of the discussion and had you bothered to read you would know that, read here.
Every 64-bit platform i'm aware of still has a 32-bit int. There may be some software that will waste memory on unix like systems when it uses "long" (which is typically 64-bit on 64-bit unix like systems) where a 32-bit value is fine but I doubt that is significant in the grand scheme of things.
The code itself is usually slightly bigger on x86-64 than on x86 which is probablly what the GP was reffering to but in the grand scheme of things code is usually pretty small and the greater efficiencies for position independent code offset this by reducing the chance of multiple copies of the same code being loaded at once due to load time relocations.
The real problem is pointer heavy code. If a program uses data structures that are mostly made up of pointers (or integers that could potentially contain a typecasted pointer and therefore need to be pointer-sized) then those data structures will nearly double in size on x64.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
If you do anything serious with Java, on Windows, because of the memory layout and the insistence of the HotSpot VM on being allocated contiguous stretches of address space, you're limited to about 1.2GB of heap space. When you have a domain that has object counts in the 3 - 5 million region, that fills up rapidly. This is for a big graph of objects and the queries for them involve lots of graph traversal. The code in question can do set queries in about 0.5s that an RDBMS takes over 5 mins to do, so there's a real value to caching all the objects on the heap.
Yes, I could use another language that doesn't have a stupid VM and have ample overhead in 4GB...
Actually, I think you're mistaken about how much more heap space you could get outside of Java. On Windows you can tune the amount of your address space taken by the kernel down and thus, possibly, start with as much as 3GB for user space available to your app. On Linux, you can't even do that and are stuck with 2GB to start. Load your libraries and runtime for *any* language and tool set, and you're not going to have all that much more than 1.2GB. Sure, maybe 1.5GB and on Windows maybe even a bit more. But until you go to 64-bit, you're stuck starting out with 1-2GB.
Also, address offsets and immediates are larger when 64-bit.
Mitigating this somewhat are new 64-bit instructions which can do in a single operation what you might need several operations to do on a 32-bit system. Adding two 64 bit integers, for example, is a single add instead of one add and one add with carry. With multiplication or division of 64-bit values, you're talking big savings.
He's never written anything that's tested the limits of computing...
And he's making invalid assumptions about what ordinary users might need. It's not that 32-bit is enough because most users don't need anything that requires 64-bit, it's that most users have never been offered things that require 64-bit, because they didn't have it. You know what needs more heap than you'll get in 32-bit to work well? According to IBM researchers: voice recognition. Yep, their research finds that being able to keep around 2GB+ enables huge, qualitative improvements. In my own work, I've hit the limit trying to do something simpler: provide really good auto-complete suggestions based on the individual user's corpus of work. Then of course there's all sorts of other searching algorithms, there's a huge difference in usability when you can provide a user a "browsable" interface that responds literally at the speed of thought vs requiring the user to compose the entire query and then wait a few seconds.
Meanwhile, I need only load up my badly coded evolutionary program to see my machine scream at the ~12 GB hit to the RAM. I say badly coded because I have found a few tricks to help get some additional memory savings out of it...also on topic, the aggression level was kind of low, so I imagine future tests might break the 32 GB barrier easily. Currently thinking of giving it a SSD for virtual memory...
So is 64-bit with 64GB in your computer not an option for you?
But 64 is twice as big as 32, thus it must be twice as good!
Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
Current intel mainstream desktop CPUs support four DIMMs and current high end high end desktop CPUs support eight DIMMS
Not exactly. Memory controllers support ranks, not DIMMs. One rank is one fully populated bus width. Standard DDR memory controllers are 64-bits wide, and memory modules are typically 8-bits, meaning you have eight modules to a rank. The memory controllers on desktop CPUs typically support two ranks per channel at full speed, and four ranks at reduced speed, so two ranks per double-sided DIMM, and two DIMMs per channel. On the other hand, if you get high density quad-rank DIMMs, then you can only add one per channel.
> Afaict the largest DIMM of desktop memory* currently available is 8GB.
> * DDR3, unregistered non-ecc.
Depends how you define "desktop memory"
16 GB, and 32 GB sticks are "available" in extremely limited supplies
$360 Kingston 16GB 240-Pin DDR3 SDRAM DDR3 1333 Desktop Memory Model KVR13LR9D4L/16
http://www.newegg.com/Product/Product.aspx?Item=N82E16820239525
$1400 HP 627814-B21 32GB DDR3 SDRAM Memory Module
http://www.newegg.com/Product/Product.aspx?Item=N82E16820326202
Not sure if this counts as desktop memory ... (technically NewEgg lists it as Server Memory)
$1400 IBM 32GB DDR3 ECC Registered DDR3 1066 (PC3 8500)
http://www.newegg.com/Product/Product.aspx?Item=N82E16820135081
> I've seen systems that claim support for up to 2TB of ram.
The HP ProLiant servers support up to 2 TB with 64 DIMM slots. Only $10K for the mobo, the RAM will only cost you $90K :-)
http://h10010.www1.hp.com/wwpc/us/en/sm/WF04a/15351-15351-3328412-241644-3328422.html?dnr=1
But yeah, looks like we have to wait another 10 - 20 years before we start seeing "normal" desktop motherboards support more then 128 GB. The 4 or 8 DIMM sockets will be "good enough" for a long time.
you're limited to about 1.2GB of heap space
That always pissed me off when trying to load large datasets. I remember buying a pricey, brand new dual-core Opteron and 2GB of memory back in 2005, so I could work on some things at home, and having to reboot into Linux to actually make use of it. Even on XP64, 32-bit applications still fell under the same restriction.
Actually, the x87 FPU has always been 80-bit precision, even on old 32-bit processors. There was no significant improvement in floating point performance between K7 and K8, besides clock rate. 32-bit versus 64-bit only holds relevance for integer math and pointers.
Thank you. The people spouting nonsense about 32-bit programming, and how they can't understand why 64-bit computing would be faster (in the x86 world) drive me loony...
To be fair, increased register space is completely independent of 32-bit versus 64-bit processors. It was a much needed architectural improvement that just happened to coincide with the transition. The only direct computational improvement of a 64-bit CPU is when doing 64-bit integer math.
Because various things suddenly take up twice as much memory, and thus require more memory bandwidth to operate at the same speed. In reality, the performance hit is slight, and more than accounted for by the increased register space available to applications properly compiled for x86-64.
Beyond me why they are having this argument at all. Is it even possible to buy a 32-bit PC any more? I have had 64-bit PCs for the last 5 years, running 64-bit software. I don't "need" 64-bit over 32-bit and I am sure that some of what I do, like editing, could be on 8-bit. It's not for any performance gain, 64-bit is just the current standard as far as I am concerned.
Yet there are millions of people out there running 32-bit OS's on 64 bit PCs - why?
Actually, you get however much of that memory is split off to userspace. The default on Windows is a 2GB/2GB split. Linux defaults to a 3GB/1GB split, offering more available to the application. In both cases, that is a user-configurable option.
There is no "31-bit size limit". It's a 32-bit computer, able to access 32-bits, or 4GB, of memory.
A long time.
We don't even have true 64-bit x86-64 processors yet. While programmers are told to* treat pointers as 64-bit in the current implementation (reffered to as a "48-bit implementation" there are only 47 usable bits for user-mode pointers**. That is enough to map 128 terabytes to one process, afaict the most ram you can currently get in a PC architecture machine is 2 terabytes.
If we assume the largest available memory size doubles every 1.5 years and we want to be able to map all the memory to one process then we have 9 years until the current implementation is used up and another 24 years after that before a "full 64-bit" (with one bit used to distinguish between kernel and user mode) implementation is used up.
* Of course just because programmers are told to do something doesn't mean they will http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=642750
** A 48th bit is used to differentiate kernel and user addresses. The number is then sign-extended to produce a 64-bit number.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
That 3.3GB cap is likely because you have 512MB towards your video memory, and another couple hundred MB consumed elsewhere. All accessible memory gets lumped into the same 4GB cap. Why not make a separate swap partition independent of your encrypted system partition?
I'm not sure what in my post you think you're referring to. My point about SSE is that all x86-64 chips support it, only some x86-32 chips do. It is part of the basic ISA, not an extension. This means that ABIs use it (for example, the SysV ABI uses SSE registers for parameter passing and value returning). And, because it's always there, the compiler can use it. It is vastly easier to generate code for a machine that has 16 registers, any of which can be used as operands for any instruction, than one where you have 8 registers and most operations can only use 2 and a lot have the side effect of moving all values up or down one register. You often end up with a lot of spills in x87 calculations because register allocation and instruction selection are really hard.
I am TheRaven on Soylent News
Despite the fact that two of those modules are listed in the "desktop memory" section they are all listed as being "registered". Afaict at least in the intel world "registered" memory can only be used with "server" platforms.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
I'm confused. If the first chip supporting x86 64 didn't come out until ten years ago, how did Linux support it two years prior?
You are not alone. This is not normal. None of this is normal.
Did WINE ever support Real Mode Win16 applications? There were so few Windows 1.x/2.x apps out there that I highly doubt it. Some apps written for Windows 2.1 and up seem to be dual mode, supporting both Real and Standard (286) Mode of Windows.
I didn't know that! Some time soon I'm gonna have to get some very old games up and running in Linux soon... Thank you!
Only 256MB are mapped to the vid card even if you have more vid memory (possibly 256MB are reserved no matter what even with a 128MB or 64MB one?). Only having multiple video cards will make you waste more memory.
I suppose it's the slashdot equivalent of being Rick-Rolled.
With BOINC and "@Home" projects, who knows : home users will eventually get there.
Microsoft dropped the ball on 64bit, Linux does too, however because most Linux tools are open source they can just be recompiled, so it isn't as big of an issue.
However compared to Solaris and Apple Implementation from 32bit to 64Bit, the PC transition is very sloppy.
We have Windows 32bit and 64bit. You would expect if you have a 64bit Computer that getting the 64bit OS would be the best choice. No not really, there are too many (Not most, but a lot of them) 32bit apps out there that just will not work, or if you need to have them talk across each other you get more issues. .NET would have worked on helping resolve the issue 10 years ago, why else would we have a development platform that compiles to run as slow as Java but only works for Windows, I figured it would be for an easy transition to 64bit systems. No .NET doesn't even do that too well.
I though
Sure the old 16bit apps for Windows 3.1 have finally died, I can get over that, but if you have Office 2007 and Office 2010 apps installed on the same system, you can get into trouble with some other tools that integrate with them.
Working with Solaris during this transition a few years earlier, it was seamless apps worked as designed and we weren't fighting 32bit vs 64bit. Apple too made it transparent. But Microsoft really dropped the ball, they could have allowed the move to 64bit happen much earlier, but they were too busy fighting Linux and Apple and Google vs trying to make this migration easier.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
"Let me guess, you ran it in 32 bit mode, then ran it again immediately after in 64 bit mode ... and then ignored the disk cache completely?"
Nope, I did dozens of runs for each, ignoring the first result that was obvious disk I/o bound (because it was much longer). As others have explained, and I said, some code benefits greatly from x64.
"...I think the Microsoft hatred is a disease." - Linus Torvalds
AMD wouldn't launch a chip they had never booted, so it couldn't really be publicly released BEFORE it had any OS support. So, you need an OS before you release.
In fact, they really wanted to test the instruction set and design before spending big bucks fabbing the chips. Linux had already been 64 bit for five years on Alpha, so 64 bit Linux was well proven. So they created an emulator for the new instruction set and Linux was ported, running on the emulator. Therefore, Linux supported x86_64 before the processor physically existed.
Opteron ought to be compared to POWER, not PowerPC, given the target markets. Also, in the Windows world itself, you had 64-bit MIPS and Alpha based workstations that ran NT - too bad it was badly supported by Microsoft and never caught on - long before AMD extended the x86 instruction set to 64-bit. While that was imaginative, it was tragic, as it ensured that the ultimate CISC to RISC migration that everyone - including Intel - hoped for, never happened.
64-big x86-64 code only suffers a slight size penalty in real-world applications; I've seen it range anywhere from zero to 10%. The longer pointers cost you but there aren't that many of them in typical code, and you get some code size back by gaining access to more registers.
This led to an interesting phenomenon in the early days of x86-64: programs recompiled for 64-bit architectures typically had a 20% speed advantage on Athlon 64 systems but no advantage at all or a slight slowdown on Pentium 4 systems. The AMD systems were execution-unit bound, and doing fewer but larger instructions was a win. The P4 was instruction-fetch bound (the design's memory bandwidth for instruction fetch was lacking) and so the fact that the programs were bigger hurt the P4's performance. AMD was also helped by the fact that the code optimization in compilers at the time was tuned for AMD processors as they had gotten to market earlier.
I did some contract work for a large international company about a year ago. For reasons I still cannot fathom, the company was still standardized on WinXP. (If things have gone according to plan, they have upgraded to Win7 by now).
One of the tasks I was assigned was setting up workstations for some engineers who were feeling the pain of the 3.5 GB memory limit, and getting all their software to run.
So here we are, 9 or so years ago since the release of XP64, and even with all updates applied it was still not a usable OS. Never mind the challenges of getting legacy 32-bit stuff to run, I got that handled. The deal breaker was that the only available driver that would even work with the very common nic couldn't seem to get it to run over 1 mbit.... Also, IE (a corporate requirement) really loved to crash for no discernable reason. In the end, it turned out that the only way to even come close to a usable system was to run 32-bit software, and it had to be set to "Run as Administrator". (Which, of course, defeats the entire purpose of running a 64-bit OS)
Maybe there's some magician out there that could have got it running well. I'm primarily a Linux guy, the last time I qualified as a Windows guru we still had "Program Manager" and hadn't yet heard of the Start button. But I do know a few things, including how to make use of Google, and I did a LOT of research on how to solve these nagging, idiotic problems. The conclusion I was left with is that 64-bit XP is simply an unfinished product. They slapped it together in a rush, and moved on to working on Vista before they could be troubled with fixing any of the broken shit.
That's my story and I'm sticking to it.
On Windows you can tune the amount of your address space taken by the kernel down
Link? I'm interested to see how that's done.
On Linux, you can't even do that and are stuck with 2GB to start.
Huh? The default 32-bit Linux memory split is 3/1; 3 GiB for userspace, 1 GiB for kernel space. If you compile a custom kernel you can configure this differently.
Did you perhaps swap "Windows" and "Linux" in your comment?
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Oh, yeah, that driver that managed to get a gigabit nic to run at a megabit, that was a Vista driver which explicitly did NOT support XP. And out of the 30 or so I downloaded and tried, it was the only one that gave even a sniff of network connectivity....
So anybody could implement any of these CPUs independently of the companies that started them, and not worry about any patent violations? By MIPS IV, you mean the R8000, right, or is it R5000? I'm curious about whether Oracle would then come out w/ a SPARC v10 in that case, although it's not that there are others aside from them who would be interested in such a CPU.
I'd love to see some of these CPUs, such as the MIPS, get resurrected and used in newer IPv6 routers and other networking gear. With more open hardware, it would be easier to manage.
Sure, I'll try that. Under my 32-bit Linux OS. Works just fine, since 32-bit Linux has access to more than 4GB RAM, it just can't give all of it to one process. It wouldn't be a notable success on Windows, where the 32-bit version has 4GB of address space, period.
That said, I'd still choose to run the 64-bit version of Linux in that scenario.
I think a good project would be to port OpenVMS to a still surviving RISC such as MIPS, POWER or even SPARC. Both MIPS & POWER have open consortiums, so making a 'VAX' platform based on either, and then porting OVMS to it would elongate its lifetime for those who must have it. Even better would be to re-implement the Alpha 21364 architecture - the netlists and whatever - to today's lithographies w/o changing a thing about frequencies, or anything else. You'll get a lot cooler CPU which runs the same OVMS software w/o any changes, and can then just extend the lives of existing Alphaservers indefinitely. Certainly better than nervously following HP and Itanium to goodness knows where.
Love that idea. Send it here: mailto:feedback@slashdot.org.
"Murphy was an optimist" - O'Toole's commentary on Murphy's Law
I suppose it depends on how you look at it. If you view the page directory as a bank select then it is a sort of segmentation.
If you view the page directory as a bank select then any form of paging with more than two levels of page table is a sort of segmentation, including x86 paging without PAE (all the way back to the 80386), and the form of paging on just about every modern processor.
...typically using a signed int for pointers so that you don't have to have separate code paths for adding and subtracting from an address. The host itself can address 4GB but the OS may not let individual processes go over 2GB. This is the case in Windows x86, for example.
Dewey, what part of this looks like authorities should be involved?
Nothing new here really. Some early 32-bit processors couldn't actually address 4GB of memory, and it was a long time before anybody produced a motherboard for a 32-bit processor that could hold 4GB. (It never happened for the 386 or 486 although those CPUs had 32 address lines.) No Alpha CPU ever had 64 real address lines, and I doubt that any SPARC or Power CPU has to date.
Seeing as the PowerPC 970 ("G5") was the first 64-bit PowerPC (not POWER, though) processor, even the Opteron came out before the first 64-bit PowerPC.
As I said in an earlier post in this thread, "Both POWER (all-caps) and PowerPC refer both to instruction set architectures and brand names used on processors that implemented them."
If "the first 64-bit PowerPC" refers to PowerPC-the-brand, yes, the first one (other than the 620, which wasn't made in large quantities) was the 970.
If it refers to PowerPC-the-instruction-set, the first one was the POWER3 - it implemented the full PowerPC instruction set (as well as the POWER2 version of the POWER instruction set).
Opteron ought to be compared to POWER, not PowerPC, given the target markets.
Presumably referring to, as per the distinction I drew in "Both POWER (all-caps) and PowerPC refer both to instruction set architectures and brand names used on processors that implemented them.", POWER and PowerPC the brand names used on processors, not POWER and PowerPC the instruction set architectures.
Also, in the Windows world itself, you had 64-bit MIPS and Alpha based workstations that ran NT...
...which was a 32-bit OS, so it didn't provide 64-bit computing on those 64-bit processors.
64-bit XP never got any acceptance in consumer PCs and rightly so, because the drivers needed for many consumer devices were never written. It wasn't just sound cards as another comment implied, it was also all the other stuff that people connect to their PCs. Good luck getting that cheap printer or scanner, TV tuner, etc. to work on the 64-bit version.
64-bit XP did get used in engineering settings. At the time I told people to avoid it unless they had a specific need for its support for large applications.
The story changed with Vista. You needed new drivers anyway and most devices got both 32 and 64-bit driver support. (The fact that you had to offer both to get permission to use the "Designed for Windows" logo didn't hurt.) But the 64-bit version didn't really go mainstream until Windows 7 came along; by then ordinary desktop users were buying PCs with enough memory to need it.
Disclaimer: I'm a Mac guy working in OS X (which is different yet), so I'm only generally familiar with Windows & Linux VM details. Anyway:
On Windows you can tune the amount of your address space taken by the kernel down
Link? I'm interested to see how that's done.
It's a /3GB switch in boot.ini. Took me a while to find any decent info about it (because in the Windows world there seem to be a bunch of hosers with blogs who don't know the difference between kernel address space and the page file, and google doesn't that know their posts are tripe)--the normal is the even split of 2GB each to kernel and user space, this makes it 1GB to kernel and 3GB to user space.
On Linux, you can't even do that and are stuck with 2GB to start.
Huh? The default 32-bit Linux memory split is 3/1; 3 GiB for userspace, 1 GiB for kernel space. If you compile a custom kernel you can configure this differently.
Did you perhaps swap "Windows" and "Linux" in your comment?
No, I was simply mistaken about the default split on Linux. (And my comment about "can't do that" referred to user-accessible configuration, not building your own kernel...)
Yes, this is a bit of an oversight on my part. I really should have been discussing the way PAE is implemented on ia32 processors. I had a little bit of difficulty finding information about it online, I'll have to consult my architecture books at home, and will expand on the original post. Here's a bit of the info I could find about PAE weirdness on IA32.
https://www.kernel.org/doc/gorman/html/understand/understand005.html#sec: High Memory
The key quote:
That is NOT the case for Windows x86. That is the case for the default configuration of Windows x86. It could be modified to the users' preference using a boot flag.
How does 32-bit Linux handle video cards with a gig or more of VRAM? Honest curiosity; the last time I ran 32-bit Linux I think my video card had only 256MB. On Windows, I think it would just fail to load the video driver, although I haven't checked (been a long time since I ran 32-bit Windows on raw hardware too).
There's no place I could be, since I've found Serenity...
Aha, thanks for the clarification. I know a few assembly languages, including x86, but had never really read up on the x64 extensions. Doubling the register count *and* doubling the width is definitely a huge improvement, as is being able to do relative jumps with large offsets.
There's no place I could be, since I've found Serenity...
But WinNT was 32bit so we are right back to the hair splitting again.
What MATTERS is why you and I can go out and buy 4GB and even 8GB of RAM on a single stick without having to take out a loan and that? That was all AMD, before that there really wasn't a point in large RAM sticks because the market was too nice, but with XP X64 and Athlon X2 and Pentium D you finally had a way for anybody to run more than 4GB of RAM and thus RAM sizes exploded.
If it weren't for Win 7 I'd still be running XP X64 BTW, that was a truly great OS and if I wanted to set up a system that needed every cycle for the program like Folding I'd probably go XP X64 over Win 7, it was insanely low resource which made programs just fly on it. Hell I was running it on a Pentium D805 which was a shitty chip and it was just zippy, damned good workstation OS.
ACs don't waste your time replying, your posts are never seen by me.
maybe you should consider using dosbox...
http://www.dosbox.com/
http://www.sierrahelp.com/Utilities/Emulators/DOSBox/DOSBox.html
This is a UDP joke, I don't care if you get it or not...
Yeah, that (NT being 32-bit on Alpha & MIPS) was another pity - Microsoft could have had then on those platforms what it has today on the x64, and had the OS readily 64-bit much earlier than it eventually did. And yes, I was referring to the processors, rather than instruction sets: POWER was used in the RS/6000 line of workstations from IBM, while PowerPC was used in Macs. Similarly, Opterons were used in servers, whereas Athlons were used in PCs. Therefore, PowerPC:POWER::Athlon:Opteron
No, I was simply mistaken about the default split on Linux
Ah, okay.
And my comment about "can't do that" referred to user-accessible configuration, not building your own kernel...
On Linux building your own kernel is user-accesible configuration :-)
Seriously, on Debian/Ubuntu, it takes a maximum of three commands (including installing all of the required tools), and it may be doable without touching the command line at all. It definitely doesn't require editing any files. It's more time-consuming (due to the time required to download tools and build) but may be easier than finding and modifying boot.ini.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
I have yet to see a PC that doesn't feel slow rendering a full HD movie shot.
Mexico: 100% conservative's America now!
Thing is, I'm positive there were multiple copycats, and with the content Kristopeit usually wrote it was trivial to duplicate. In a sense it made the whole Kristopeit troll even more epic because not only would he (or it?) be writing troll posts but copycats too and it just spiraled into madness.
With APK it's almost impossible for a normal, rational individual to accurately duplicate the unique style of a genuine APK rant. That long copy pasta that's been making the rounds is a pretty good approximation though.
Celebrity worship is a poor substitute for Deity worship and costs more to boot.
So anybody could implement any of these CPUs independently of the companies that started them, and not worry about any patent violations?
Well, not quite. It's definitely possible to implement a CPU that is compatible with a 20-year-old CPU without infringing any patents, because any patents that were used in the original have now expired. That doesn't mean that the implementation will be patent free, however. For example, our branch predictor is cleverer than any shipping CPU 20 years ago, but I've not done a patent search so I don't know if it's covered by more recent patents. I intentionally avoided some techniques I know to be patented, but some of the older ones might have been patented around 1996-2000ish. That said, if we have to change the branch predictor to avoid patents then it's not a big deal - it doesn't change the ISA (and our CPU runs in an FPGA, so deploying a new version takes a few hours).
By MIPS IV, you mean the R8000, right, or is it R5000?
R8000 was the first MIPS IV chip, yes.
I'm curious about whether Oracle would then come out w/ a SPARC v10 in that case, although it's not that there are others aside from them who would be interested in such a CPU.
Well, Sun did something similar with the UltraSPARC spec. SPARCv9 doesn't specify the privileged mode instruction set, and this used to vary. The UltraSPARC spec that they released along with the T1 specified this and was intended to provide a stable interface for operating systems. Some parts of that may have been difficult to implement without trampling on patents. There are also likely to be issues with things like SIMD extensions.
I'd love to see some of these CPUs, such as the MIPS, get resurrected and used in newer IPv6 routers and other networking gear.
All Juniper routers run a tweaked FreeBSD on 64-bit MIPS and a lot of low-end routers use 32-bit MIPS chips.
With more open hardware, it would be easier to manage
The problem is always competing with companies that have large volumes. Intel is basically a process generation ahead of everyone else, and they have economies of scale that only the big ARM SoC vendors can match. There isn't really a business case for using a reimplementation of an old CPU architecture when you can get a Cortex A15 that will outperform it for less money. The only reason you'd want to is so you can add some custom instruction set extensions that massively speed up your workload, and even then they'd need to give and order of magnitude speedup for it to be a net win. In Juniper's case, they go with in-order MIPS cores with large SMP and SMT support, so that they can do a lot of relatively simple packet processing tasks in parallel. There isn't much branching or floating point in packet filtering (it's mostly just integer arithmetic and a large amount of data shuffling and a lot of data dependencies that completely kill performance on superscalar or out-of-order chips), so general-purpose CPUs (and GPUs) are optimised in all of the wrong places. Two simple in-order cores for them can be faster than one complex out-of-order superscalar core.
I am TheRaven on Soylent News
I thought that MIPS, like the Alpha, was particularly strong in floating point. Are the versions of the CPUs you're describing integer-only CPUs, the ones that are used in routers? Also, for router like applications, I'd imagine that such CPUs are really more IO intensive than anything else, and it's the data transfer instructions there that would need the most optimization?
Transmeta's (remember them?) Crusoe processor was internally a 128-bit VLIW CPU. Their Efficion was internally a 256-bit CPU. So if one could salvage those and access the native instruction set, one has something to work with. Not to mention it being a low power consumption CPU.
Being strong on floating point is an aspect of the implementation more than the ISA. SGI's MIPS chips devoted a lot of designer effort and silicon to floating point, because that's what their customers wanted. In MIPS, however, there is a generic coprocessor interface supporting 4 coprocessors. CP0 is the system management coprocessor, which does all of the things like TLB management. CP1 is traditionally the FPU, and CP3 is sometimes the SIMD unit. CP2 is usually some manufacturer-specific extension. Cavium's Octeons, for example, put some network processing acceleration functions into CP2, but I don't think they implement CP1, or if they do it's likely a single floating point pipeline shared between cores. With a multithreaded CPU and a well-designed memory controller, you can have enough threads blocking on reads that you can handle one read and one write every cycle and completely saturate the bus, which is exactly what you want for network processing.
I am TheRaven on Soylent News
I need to offer you credit; you are right. The issue isn't really PAE, it's how the kernel manages memory on 32 bit x86 architectures with more than 1GB of memory installed. PAE simply exacerbates the problem. Here's an explanation of the complaint:
On ia_32 systems, the kernel splits memory into 3 zones; DMA, NORMAL, and HIGHMEM.
ZONE_DMA is the first 16MB of memory, and is generally avoided unless needed (due to lack of available higher memory, or for DMA mappings.) The kernel tries to reserve this address range for devices that use DMA mapping.
ZONE_NORMAL is an address space that is directly accessible to the kernel, and extends from 16MB to 896MB. Kernel data structures are stored in this space, including the kernel page tables. Memory mappings start to consume a lot of memory in ZONE_NORMAL, and thus PAE on ia_32 with a lot of installed memory can cause out of memory issues, even when there is a lot of available physical memory. User data can be allocated into ZONE_NORMAL, but is preferred to be placed in ZONE_HIGHMEM to free ZONE_NORMAL for kernel data structures.
ZONE_HIGHMEM is memory above the 896MB barrier. This address range is not directly accessible to the kernel. In order for the kernel to access anything in this zone, a temporary map must be made into ZONE_NORMAL. These mappings consume pages of ZONE_NORMAL, and suffer a performance hit. User space processes can access these pages directly (handled by the virtual memory manager system, of course.)
Generally, memory will be allocated to ZONE_HIGHMEM, ZONE_NORMAL, or finally ZONE_DMA in that order of preference.
The x86_64 architecture eliminates the need ZONE_HIGHMEM. ZONE_NORMAL extends all the way from 16MB to the end of physical memory. This approach simplifies memory management, improves performance, and is generally more flexible.
You're correct that there was a major issue with my original post... My memory of the kernel architecture had garbled HIGHMEM with PAE, and I was thinking that PAE required mapping pages above 4GB into lower memory. This would of course cause a huge performance penalty for any process consuming memory above 4GB. I deserve downmods for the technical inaccuracy.
Here's a very brief summary of the problems with HIGHMEM:
http://linux-mm.org/HighMemory
Here's a bunch of links used to refresh my memory:
http://www.makelinux.net/ldd3/chp-15-sect-1
https://www.kernel.org/doc/gorman/html/understand/understand005.html
http://unix.stackexchange.com/questions/5143/zone-normal-and-its-association-with-kernel-user-pages
Yep, you're right. I corrected myself in another post.
Thanks for the suggestion, I did just that.
If they implement that, I'm going to have to figure out a way to give you a virtual hug.
"Murphy was an optimist" - O'Toole's commentary on Murphy's Law
No. The x86-64 does not support 16-bit real mode code when in long mode, whether directly or through a virtual 8086. Nor does it support unreal mode.
However, the x86 instruction set didn't directly jump from the 8086 to the 80386; the 16-bit 286 supported a 16-bit protected mode. And in long mode, an x86-64 processor can execute such code.