Intel: No Rush to 64-bit Desktop
An anonymous reader writes "Advanced Micro Devices and Apple Computer will likely tout that they can deliver 64-bit computing to desktops this year, but Intel is in no hurry. Two of the company's top researchers said that a lack of applications, existing circumstances in the memory market, and the inherent challenges in getting the industry and consumers to migrate to new chips will likely keep Intel from coming out with a 64-bit chip--similar to those found in high-end servers and workstations--for PCs for years."
There exist two versions of the PowerPC instruction set, one 32-bit and one 64-bit. The processors currently in use are all 32-bit, and the new 64-bit ones will be a superset of the 32-bit ones (and can execute 32-bit code natively).
What's the big difference between 32-bit processors and 64-bit processors?
A 64-bit machine can address more than 4 GB of memory without funky segmented addressing kludges. This has applications in scientific simulation and database managers.
A 64-bit machine can also handle 64-bit integers as a native data type. This is important for encryption, number theory, financial applications dealing with money over $40 million, etc.
Will I retire or break 10K?
The Intel answer allows for a chip to have more than 4G of physical memory in much the same way the old LIM EMS boards allowed a 8086 to have more than 1M of memory - it is a form of bank switching.
True, you could have a PIII with 10G of memory on it (in theory, anyway), but this would not help you for the common applications for which you need these quantities of memory - databases, video editing and so on.
In those tasks, you have ONE program that needs lots of memory. You ideally want to be able to take a multi-gigabyte file, and mmap() it so that it appears to your program to be just a stretch of memory. Then you can access the file with a simple pointer, and moving within the file is nothing more than pointer manipulation. You don't have to worry about paging the file in and out - that is the OS's virtual memory manager's problem.
PAE won't help you in those cases. At best, you can back some of the buffer cache with the PAE memory, creating in effect a glorified RAM disk.
PAE is great if you have a machine running hundreds of processes, each of which takes 100M of space. But this usually is NOT the case.
Just as machines with more than 1M of memory started out the providence of the high-end user and slowly moved down, 64 bit address space on the desktop will start out the providence of the high-end folks first, then will move down as it becomes more common.
I would guess the likely sequence will be something like:
1) We *nix folks had it first - I was running 64 bits on my Alpha years ago. But we are not "the masses", and so will be ignored by the mainstream.
2) The Macs will be next - Apple will port MacOS X to the newer 64 bit Power chips. This will greatly simplify video editing - one of Apples favorite areas to compete in. 64 bit Apple will make the Mac the chosen platform for video editing of large files (NOTE: a 40 minute capture from my Firewire camcorder is a couple of gig - so already the home consumer is getting close to needing this.)
3) Windows will finally release a 64 bit OS (also note: they could have done this YEARS ago under Alpha, but didn't - Windows NT under Alpha only could access a 32 bit address space.) Microsoft will hail this as a revolutionary breakthrough - "Windows AYCABTU is the first 64 bit OS for the home user!" *nix and Apple users will scratch their heads in puzzlement.
www.eFax.com are spammers
A little bit of computer engineering here for you...
RISC and CISC are the two main forms of processors out there these days. RISC simply means that an operation instruction is embedded with both the opcode and the operands. A CISC chip is one in which the opcode tends to be the first instruction processed and the operands are the next couple of instructions inputted.
My CMPT 150 course (introduction to Computer Design) was done entirely with a Motorola HC11 Processor emulator, which is a CISC processor.
The advantage to RISC processing is that you can put in "Pipelining", which basically means a buffer for all data throughout the CPU at different levels. Now, this means that a single chunk of opcode/operand takes x clock cycles to process (x being the number of levels you have to your pipeline), but it also allows the processor to do multiple things at once, so that after the first instruction goes through to the last buffer, there's one waiting right after it for the next clock cycle, so a RISC processor can give a new CPU instruction with every single clock cycle.
Confused yet? Let me put it this way...
Pretend that your CPU is a plumbing system, with water streaming through hot and cold pipes to deliver a prefered temperature for the water. Now, the water temperatures are your CPU data (signals, bits, whatever...) and your pipes are your cpu circuitry.
Now, you want to send a big chunk of hot water down to the bottom of your pipe system using a bunch of intermediary valves (or/and/not/xor gates) and a specific pathway (Let's not ask why, let's just assume you want to do that). Now, say right after that you want to send a bunch of cold water down a similar path, but not necessarily the same path, however you will want to use some of the same pipes.
Now, with a CISC processor, what you would do is you would send down the hot water, occasionally storing it in some pipes whilst you send down the cold water, and the sheer design of the system would keep the Hot and Cold waters seperate and you would be able to output your hot water, and then output your cold water, once they have gone through their systematic storages and movements around.
The annoying thing about this is you need a sophisticated CPU to do it. And you need a bunch of clock cycles to open and close the valves and whatnot and finally get your desired output.
Now, a RISC processor does something a bit smarter.... It throws your hot water in (First clock cycle) and just lets the valves automatically trickle to the bottom, and then, on the second clock cycle, send the cold water down. The downside of this is the fact that your single clock cycle is going really slow, which means you have a big lineup of people requesting hot and cold water and they have to wait for it to come out (Lag, for those taking notes in computer-world).
So, we instate pipelining.
Pipelining is a bunch of basins (let's say 4) that appear at different levels of the pipe system.
So, you dump your hot water in the top basin. (First clock cycle)
Then, you unlock the basin and let it dump into the second basin. Once it's done that, once again, seal the basin and dump your cold water in. Now, (second clock cycle) open the plugs for both basins, and your hot water goes down the tubes (magically) before the cold water shows up and you can re-plug your basin. Now you have room for more water in the top basin.
Every move into a new basin is a clock cycle, so It takes 4 clock cycles for it to finally reach the bottom so you can do whatever the hell it is you would want to do with hot or cold water. However, these are relatively quick clock cycles compared to the clock cycle you had in your non-pipelined RISC architecture. And, ultimately, once the first output reaches the bottom, you only have to wait a single clock cycle for the input right after it, rather than waiting another oh-so-many amounts of clock cycles that you would've in your CISC architecture.
Did that make sense to anybody? I hope it did.
Karma: Non-Heinous
It depends. They aren't *only* talking about address lines, sure. But I think it is very subjective to say that the register size is much more important than the addressing.
Scientific applications have been using 64-bit computing for quite a while. What they usually use is floating point for calculations. Double precision floating point (64-bit) has been around for quite a while. Loading/Storing the 64-bit (sometimes 80-bit) FPU (stack) register using single instructions, even though it may require multiple bus transactions, and manipulating them with single instructions has existed for a long time. Scientific applications frequently have very large datasets as well - several GB not being uncommon. For performance reasons, you frequently want to load all this data into memory and not have to worry about processing data in chunks that can fit into memory (although this is an option but is bad for some types of data access and reuse patterns). The data types of scientific applications can typically be handled by 32-bit CPUs today (IEEE double precision floating point - for example) with no problems and those FP registers can be loaded from L1 or L2 64-bits wide 'in one go' - they can even be load/store from memory fast (memory typically operates at a cache-line at a time reads and can be more precisely tuned for writing). It's the amount of data being handled.
Video - I admit, I'm not an expert in this area, but I would imagine that the Altivec/SSE/Whatever are being used heavily here - although these aren't *really* that much different from what the 32-bit CPUs can do already, they just do several at the same time (SIMD). What matters here are very large datastreams (multiple GB) that have to be manipulated. I'm not exactly sure what would need to be done other than having a 64-bit file system though, and that can be (and is) done using 32-bit CPUs today. Maybe simply the ability to pull the entire image into one chunk of memory is what is desired - similar to the scientific computing issues where block read schemes are inefficient because of data access problems and data reuse. If the video files are over ~3GB, then you have a problem on 32-bit systems.
Databases - this is getting the most attention. Here, 64-bit integer manipulation becomes important (not SIMD types either) - Index/address calculations, large trees of data, etc. The other important thing is caching of data so you don't have to hit the disks. For this you want all the memory you can get.
Also... remember that just because a 64-bit CPU will typically have the ability to manipulate and use 64-bit addresses, that does not mean that all 64 address lines will be brought out of the package. For example, I would imagine that more like 40 address lines will be brought out - limiting the amount of physical memory that will actually be able to be used by the CPU to, in this case 256GB, for cost reasons. However, the virtual address space isn't effected by that and will be 64-bit regarless. Of course, over time, more and more address lines may be brought out.
Intel didn't want to make the jump to 32 bit, so they introduced "segment registers".
Um.... no. Segment registers have been in Intel's products from the beginning (at least since the 8088). It wasn't a band-aid to stall adoption of 32-bit processors as you imply with the above comment.
The current 32-bit processors also have segment registers and you can use them with the "flat" address space. Some OSes (like Linux) just set all the registers to the same segment and never change them. But you could have separate segments for the stack, data, code, etc.
--
"What do you want me to do? Whack a guy? Off a guy? Whack off a guy? Cause I'm married."
But not in a single contiguous chunk. You get to page in 4GB chunks (and this only works on Xeons).
The ordering is: byte, kilybyte, megabyte, gigabyte, terabyte, petabyte, exabyte, zettabyte, yottabyte. After yottabyte comes 'ohmygodijustcameabyte'.