Intel: No Rush to 64-bit Desktop
An anonymous reader writes "Advanced Micro Devices and Apple Computer will likely tout that they can deliver 64-bit computing to desktops this year, but Intel is in no hurry. Two of the company's top researchers said that a lack of applications, existing circumstances in the memory market, and the inherent challenges in getting the industry and consumers to migrate to new chips will likely keep Intel from coming out with a 64-bit chip--similar to those found in high-end servers and workstations--for PCs for years."
There exist two versions of the PowerPC instruction set, one 32-bit and one 64-bit. The processors currently in use are all 32-bit, and the new 64-bit ones will be a superset of the 32-bit ones (and can execute 32-bit code natively).
No, you make your memory bus twice as wide. That way you can get twice as many instructions in per clock cycle of the bus. In fact, some machines make their buses much wider than the native word width of the CPU. Some machines (big iron) have as many as 576 bits on their busses. That's why they scale so well to many processors and big workloads, compared to little PCs which may only have 64- or 128-bit busses out to RAM.
they should just cut the crap and bring out 1024bit cpu's, that way they won't have to worry about upping to 128bit cpu's however many years down the line.
What's the big difference between 32-bit processors and 64-bit processors?
A 64-bit machine can address more than 4 GB of memory without funky segmented addressing kludges. This has applications in scientific simulation and database managers.
A 64-bit machine can also handle 64-bit integers as a native data type. This is important for encryption, number theory, financial applications dealing with money over $40 million, etc.
Will I retire or break 10K?
The Intel answer allows for a chip to have more than 4G of physical memory in much the same way the old LIM EMS boards allowed a 8086 to have more than 1M of memory - it is a form of bank switching.
True, you could have a PIII with 10G of memory on it (in theory, anyway), but this would not help you for the common applications for which you need these quantities of memory - databases, video editing and so on.
In those tasks, you have ONE program that needs lots of memory. You ideally want to be able to take a multi-gigabyte file, and mmap() it so that it appears to your program to be just a stretch of memory. Then you can access the file with a simple pointer, and moving within the file is nothing more than pointer manipulation. You don't have to worry about paging the file in and out - that is the OS's virtual memory manager's problem.
PAE won't help you in those cases. At best, you can back some of the buffer cache with the PAE memory, creating in effect a glorified RAM disk.
PAE is great if you have a machine running hundreds of processes, each of which takes 100M of space. But this usually is NOT the case.
Just as machines with more than 1M of memory started out the providence of the high-end user and slowly moved down, 64 bit address space on the desktop will start out the providence of the high-end folks first, then will move down as it becomes more common.
I would guess the likely sequence will be something like:
1) We *nix folks had it first - I was running 64 bits on my Alpha years ago. But we are not "the masses", and so will be ignored by the mainstream.
2) The Macs will be next - Apple will port MacOS X to the newer 64 bit Power chips. This will greatly simplify video editing - one of Apples favorite areas to compete in. 64 bit Apple will make the Mac the chosen platform for video editing of large files (NOTE: a 40 minute capture from my Firewire camcorder is a couple of gig - so already the home consumer is getting close to needing this.)
3) Windows will finally release a 64 bit OS (also note: they could have done this YEARS ago under Alpha, but didn't - Windows NT under Alpha only could access a 32 bit address space.) Microsoft will hail this as a revolutionary breakthrough - "Windows AYCABTU is the first 64 bit OS for the home user!" *nix and Apple users will scratch their heads in puzzlement.
www.eFax.com are spammers
It keeps being tried every few years and keeps being rejected by corporations.
These guys seem to be having no problem with being rejected. I put together my school's lab for about the cost of two serious desktops, networking included. In fact, Jim McQuillan seems to be making a reasonable living out of selling such systems. It all depends on where you sit, and what you need, I guess.
Put identity in the browser.
Just because a 64 bit processor can handle 64 bit integers doesn't mean that it can *only* deal with 64 bit quantities or that its instructions are necessarily 64 bits long.
As an example, take PPC-64. Its instructions are still 32 bits long and are basically identical to PPC-32 except for those instructions dealing with 64 bit quantities, which PPC-32 doesn't have. All pointers (memory addresses) are 64 bit but you may use any size integer you wish, from 8 bit to 64 bit, depending on what you need.
A little bit of computer engineering here for you...
RISC and CISC are the two main forms of processors out there these days. RISC simply means that an operation instruction is embedded with both the opcode and the operands. A CISC chip is one in which the opcode tends to be the first instruction processed and the operands are the next couple of instructions inputted.
My CMPT 150 course (introduction to Computer Design) was done entirely with a Motorola HC11 Processor emulator, which is a CISC processor.
The advantage to RISC processing is that you can put in "Pipelining", which basically means a buffer for all data throughout the CPU at different levels. Now, this means that a single chunk of opcode/operand takes x clock cycles to process (x being the number of levels you have to your pipeline), but it also allows the processor to do multiple things at once, so that after the first instruction goes through to the last buffer, there's one waiting right after it for the next clock cycle, so a RISC processor can give a new CPU instruction with every single clock cycle.
Confused yet? Let me put it this way...
Pretend that your CPU is a plumbing system, with water streaming through hot and cold pipes to deliver a prefered temperature for the water. Now, the water temperatures are your CPU data (signals, bits, whatever...) and your pipes are your cpu circuitry.
Now, you want to send a big chunk of hot water down to the bottom of your pipe system using a bunch of intermediary valves (or/and/not/xor gates) and a specific pathway (Let's not ask why, let's just assume you want to do that). Now, say right after that you want to send a bunch of cold water down a similar path, but not necessarily the same path, however you will want to use some of the same pipes.
Now, with a CISC processor, what you would do is you would send down the hot water, occasionally storing it in some pipes whilst you send down the cold water, and the sheer design of the system would keep the Hot and Cold waters seperate and you would be able to output your hot water, and then output your cold water, once they have gone through their systematic storages and movements around.
The annoying thing about this is you need a sophisticated CPU to do it. And you need a bunch of clock cycles to open and close the valves and whatnot and finally get your desired output.
Now, a RISC processor does something a bit smarter.... It throws your hot water in (First clock cycle) and just lets the valves automatically trickle to the bottom, and then, on the second clock cycle, send the cold water down. The downside of this is the fact that your single clock cycle is going really slow, which means you have a big lineup of people requesting hot and cold water and they have to wait for it to come out (Lag, for those taking notes in computer-world).
So, we instate pipelining.
Pipelining is a bunch of basins (let's say 4) that appear at different levels of the pipe system.
So, you dump your hot water in the top basin. (First clock cycle)
Then, you unlock the basin and let it dump into the second basin. Once it's done that, once again, seal the basin and dump your cold water in. Now, (second clock cycle) open the plugs for both basins, and your hot water goes down the tubes (magically) before the cold water shows up and you can re-plug your basin. Now you have room for more water in the top basin.
Every move into a new basin is a clock cycle, so It takes 4 clock cycles for it to finally reach the bottom so you can do whatever the hell it is you would want to do with hot or cold water. However, these are relatively quick clock cycles compared to the clock cycle you had in your non-pipelined RISC architecture. And, ultimately, once the first output reaches the bottom, you only have to wait a single clock cycle for the input right after it, rather than waiting another oh-so-many amounts of clock cycles that you would've in your CISC architecture.
Did that make sense to anybody? I hope it did.
Karma: Non-Heinous
Intel spent a way too much resources for Itanium and it doesn't want to admit that it was a mistake.
Now Intel and HP are downplaying Alphas superior speed.
""If HP still believed the Alpha chip was worth the candle, rather than being cosy with its friends at the Intel Corporation, and marketed it properly, it might render all other server platforms into carbonised bread, otherwise known as toast".
But that will never happen. My sources claim that HP realises the EV7 is a fantastic chip and wants to stop potential buyers of the HP Itanium servers from buying EV7 instead.
And, we understand, the HP suits have now laid down a diktat saying that not one Alpha benchmark will be released until the Itanium platform(s) is/are faster"
Intel did not delay 32bit. It was introduced in 1985, with the 80386, but it was not used seriously until Windows 95 (which was released in 1995, if wanyone is in doubt). Thats ten years from the CPU was ready, till the software was ready to handle it.
It depends. They aren't *only* talking about address lines, sure. But I think it is very subjective to say that the register size is much more important than the addressing.
Scientific applications have been using 64-bit computing for quite a while. What they usually use is floating point for calculations. Double precision floating point (64-bit) has been around for quite a while. Loading/Storing the 64-bit (sometimes 80-bit) FPU (stack) register using single instructions, even though it may require multiple bus transactions, and manipulating them with single instructions has existed for a long time. Scientific applications frequently have very large datasets as well - several GB not being uncommon. For performance reasons, you frequently want to load all this data into memory and not have to worry about processing data in chunks that can fit into memory (although this is an option but is bad for some types of data access and reuse patterns). The data types of scientific applications can typically be handled by 32-bit CPUs today (IEEE double precision floating point - for example) with no problems and those FP registers can be loaded from L1 or L2 64-bits wide 'in one go' - they can even be load/store from memory fast (memory typically operates at a cache-line at a time reads and can be more precisely tuned for writing). It's the amount of data being handled.
Video - I admit, I'm not an expert in this area, but I would imagine that the Altivec/SSE/Whatever are being used heavily here - although these aren't *really* that much different from what the 32-bit CPUs can do already, they just do several at the same time (SIMD). What matters here are very large datastreams (multiple GB) that have to be manipulated. I'm not exactly sure what would need to be done other than having a 64-bit file system though, and that can be (and is) done using 32-bit CPUs today. Maybe simply the ability to pull the entire image into one chunk of memory is what is desired - similar to the scientific computing issues where block read schemes are inefficient because of data access problems and data reuse. If the video files are over ~3GB, then you have a problem on 32-bit systems.
Databases - this is getting the most attention. Here, 64-bit integer manipulation becomes important (not SIMD types either) - Index/address calculations, large trees of data, etc. The other important thing is caching of data so you don't have to hit the disks. For this you want all the memory you can get.
Also... remember that just because a 64-bit CPU will typically have the ability to manipulate and use 64-bit addresses, that does not mean that all 64 address lines will be brought out of the package. For example, I would imagine that more like 40 address lines will be brought out - limiting the amount of physical memory that will actually be able to be used by the CPU to, in this case 256GB, for cost reasons. However, the virtual address space isn't effected by that and will be 64-bit regarless. Of course, over time, more and more address lines may be brought out.
If you need big processing, you still buy the big iron. Next time you're at the airport and the ticket agent is checking you in, sneak a peek at the logos on the terminals they're using. Oh sure they'd love to upgrade to a spiffy new-fangled GUI based dingus, just no one's figured out quite how to do that.
When I signed on with IBM back in 1994 they were trying to replace their big iron with PCs. "By end of year 1995," they promised us, "all the mainframes will be gone and all our applications will run on Lotus Notes." Well here it is nearly a decade later and they still haven't replaced that big iron, and they'll never get rid of their RETAIN technical support database. No one can figure out how to deliver RETAIN's performance on any other platform.
Sure, today a mainframe might consist of over a thousand high-end desktop processors working in unison, but look how many processors they had to slap in there to deliver the performance the customers expect from that big iron. And those are all wired together and working closely, unlike that (much smaller) network cluster your latest clueless technical manager just suggested.
So what Intel is really saying here is their marketing department just realized that they will never deliver that kind of performance in a desktop or even in a 4 to 8 way "server" machine. The customers they're targeting will continue to purchase the big iron when they need that kind of processing power, and the "toy" shops are happy with the 32 bit processing power. By the way, Google essentially just built themselves a mainframe. I wonder how the cost of their solution would stack up against the biggest iron IBM currently provides...
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Being a 64-bit CPU generally refers to the size of the general purpose integer registers, how many bits wide the ALUs are, how much data can be shipped to/from the register in one data movement, and how many bits of address are used in a virtual address.
The Pentium line is close, but fails the 'test' in the general purpose register department as well as the ALU width department. Also, remember that although an MMX register may be very wide (compared to the general purpose registers), they are treated as if they are some number of smaller registers tacked onto each other. For example, a 64-bit wide MMX register is actually treated (depending on the operation desired) as eight 8-bit registers, four 16-bit registers, or two 32-bit registers. For example, if A, B, C, and D are all 32-bit values, two 64-bit MMX type registers may hold:
MMXreg1: A:B
MMXreg2: C:D
and if you perform a 32-bit MMX addition you get:
MMXreg3: A+C:B+D
Intel didn't want to make the jump to 32 bit, so they introduced "segment registers".
Um.... no. Segment registers have been in Intel's products from the beginning (at least since the 8088). It wasn't a band-aid to stall adoption of 32-bit processors as you imply with the above comment.
The current 32-bit processors also have segment registers and you can use them with the "flat" address space. Some OSes (like Linux) just set all the registers to the same segment and never change them. But you could have separate segments for the stack, data, code, etc.
--
"What do you want me to do? Whack a guy? Off a guy? Whack off a guy? Cause I'm married."
Er, bullshit.
Hara = stomach
Kiri = gerund of kiru (=to cut)
Literally, 'stomach-cutting'.
It's the vernacular for seppuku (which, by the way, is written using the same characters - setsu is kiru, fuku is hara).
But not in a single contiguous chunk. You get to page in 4GB chunks (and this only works on Xeons).
The ordering is: byte, kilybyte, megabyte, gigabyte, terabyte, petabyte, exabyte, zettabyte, yottabyte. After yottabyte comes 'ohmygodijustcameabyte'.
Make that 4 billion
That's exactly what yerricde said, 4 billion cents.
It surprises me that no one (at least at the top level) has mentioned this, but for the short term, what excites me the most about AMD's 64-bit implementation is the addition of new registers that comes with AMD finally designing the ISA themselves.
0 0218)
Here are some general specs on x86-64:
64-bit addressing
8 Additional GPRs (for a total of 16)
GPR width increased to 64-bits
8 128-bit SSE registers (for a total of 16)
64-bit instruction pointer and relative addressing
Flat address space (code, data, stack)
--Ace's hardware (http://www.aceshardware.com/read_news.jsp?id=100
The fact that x86 has only had 8 General Purpose Registers has been the bane of its existence for quite a while... I think that this will be the main source of speed improvement over existing 32-bit apps when compiled for the x86-64 architecture, not the fact that the system can handle more precise numbers.
As far as selling these things, having worked in video game retail, the consumer is already very conscious of the idea of an n-bit processor from all the old console hype where the precision of the CPU was marketed as the primary "performance number" the way Mhz are on desktop PCs.
--Shon
See more here
(and these should in essence be applicable for any other OS too):
Large Memory Support
Windows XP 64-Bit Edition supports up to 16 GB of RAM and 16 TB of virtual memory, enabling applications to run faster when working with large data sets. Applications can preload substantially more data into virtual memory, allowing rapid access by the Intel Itanium processor.
Optimized for the Intel Itanium processor family
Windows XP 64-Bit Edition has been optimized specifically for the Intel Itanium processor and benefits from its key features, such as the Explicitly Parallel Instruction Computing (EPIC) design and increased floating-point performance.
Multiprocessing
Windows XP 64-Bit Edition is designed to support multiprocessing capabilities for maximum performance and scalability, supporting up to two symmetric Intel Itanium processors.
Interoperability
Windows XP 64-Bit Edition provides a rich platform to integrate both 64-bit technical applications and 32-bit business applications using the Windows on Windows 64 (WOW64) x86 emulation layer.
Same programming model
Developers with 32-bit skills will be comfortable and quickly productive in the Windows on Itanium environment, finding it virtually identical to the development environment for 32-bit Windows.
- 18446744073709551616B
- 18014398509481984kB
- 17592186044416MB
- 17179869184GB
- 16777216TB
- 16384PB
- 16EB
in comparison, IPv6 has 128 bit addresses, so it can address 340282366920938463463374607431768211456 hosts. Boy, I can't wait 'till we have 4096 bit computing! Yes folks, you could address: 9498661542358172543497427893422576585907920607927"If anyone needs me, I'm in the angry dome."
Not anymore. With iMacs coming with decent video-editing tools, and consumer versions (only $300) of Final Cut, and other tools, Joe User is getting interested in this stuff.
Not to mention students in film school, etc. 64-bit procs sure could be useful to them in the near future.
I dunno though, I guess 4 GB is till enough for most Joe Users for now... But just wait for Windows XP 2004 3.1!
Sticking feathers up your butt does not make you a chicken - Tyler Durden
Err...video encoding? After all, aren't iMovie and Windows Movie Maker aimed at the consumer market?
Many companies in the entertainment as well as computer chip design industries use rooms full of cheap x86 machines to perform the bulk of their batch processing. _That's_ where they're hitting the 4GB-per-process problem. We're running Linux on hundreds of Pentium III/4s, and with kernel tweaking are getting around 3.2GB per process. But even that's not enough for many job types...
My long-term goal is to be running a fully-realistic, totally customizable and scalable universe with believability passing anything depicted in the matrix - in other words my own play universe. I already calculated that to run a sufficiently realistic emulated 'Earth' with everything simulated, people, plants, trees, mountains, rocks in realistic detail would take at least 10^30 ops/sec. I would certainly need several Zettabytes of memory to run it effectively.
So quite frankly, there is no such thing as too much RAM, Storage or Speed that I could need, assuming the software was developed to utilize it.
Planet P Blog
www.enthea.org
You don't seem to understand what VMWare is. It's a virtualizer, not an emulator. That means that 99%+ of instructions that run run directly on the hardware. Whenever an instruction runs that requires special privledges (like in drivers (hence the reason vmware suggests you use their custom, vmware aware drivers, for high trapping drivers like the display driver)), vmware traps out and handles an emulation of that event. Plex86 is the same idea. If an actual emulation were occuring, running rates would be closer to the range of a max of maybe 33% of maximal CPU possibility (enough to read, compare, then load the appropriate instruction(s)). The Itanium 1 (I think the 2 might be different) does the same basic thing, emulating an x86, but in hardware. And even *then*, it's still rather horridly slow in comparison to a real x86. This is the major reason why AMD is probably going with a compatible x86 design, so only when special instructions are reached does the OS/system have to trap out into 64-bit mode while running 32-bit code.
Yeah... and you could have every single gaming company write drivers for different video cards, and you get pissed when your $600 Turtle Beach card doesn't play sounds in Doom III. You also get to reset your computer every single time you want to play.
Do people multitask while playing games like Doom III? HELL YEAH! I can't remember how many times I've 'windowed' UT or TO:AoT to tweak my TeamSpeak settings. Or how often I take a break while woring (I work at home) to let off some steam lobbing grenades or rushing SF with my trusty AK.
Besides, Doom III is as much a proof of concept as it is a game. By developing the engine in a console-like enviroment you're limiting it's 'real world' parameters, you're not letting it get tested. Let's not kid ourselves, in 2-3 years time there's going to a *lot* of games toting the Doom III Engine badge.
Anyway, we've been in this situation before - praying your game can detect your video and sound card. This is why DirectX and OpenGL are popular - they provide a much needed interface and abstraction layer to your sound and graphics. This is one of the promises of a modern OS - set up an interface to differnet devices. Configure it once and you're set! The lack of this was one of the worst things about DOS, and I don't really want to go back there.
The 68020 was truly a computing milestone (the first 32 bit CPU, after all) and it had excellent features such as a fully functional MMU, and an available FPU, not to mention it came in speeds up to 16 MHz originally and eventually up to 33 MHz. I used to have a Sun 3/260, which I later upgraded to a 4/260.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Either you are Linus Torvalds, read linux-kernel, or have nearly exactly the same opinion as linus.
Show that to people here in this thread, that should be enough namedropping for slashdot.
Btw., this lklm thread is really informativ.
Nah, Intel got rich off of IBM's decision to adopt their chip in the early PC architecture at a time when Businesses were saying "You can't get fired for buying blue." Intel's monopoly in a niche market that broadened out gave them a huge economy of scale that other vendors could not match (e.g. Motorola 68K, MIPS, SUN's SPARC or DEC Alpha never managed to break in although they had some seeming technical advantages over their Intel counterparts). Had those vendors enjoyed a larger share of the market place, they could have been cost competitive with Intel. AMD is vulnerable due to timing issues, but Intel stays big because they got big and they got big because were in the right place at the right time.
From the programming side (and I mean C), the only really fundamental difference between a 32-bit and a 64-bit address space is the size of a pointer. Right now on most platforms, a pointer is 4 bytes, same as an int, so if you want to do dirty pointer math tricks, you don't have to even think about truncation or anything. Under a 64-bit system, the pointers are 8 bytes, but the regular default int type might be 4 still, so you have to be careful about how you treat those numbers.
If you are never screwing around with the way you store and dereference pointers, then (given that all the other libraries already exist under the target 64-bit platform) compiling for 64-bit is just as easy as anything else. In fact, you can develop under 32-bit, and then once you get access to 64-bit, you just recompile and hope that you haven't forgotten anything.
Then there are cross-compilers and emulation too...
-S
Actually, I studied OS design (and yes, I do read linux-kernel), which is the reason I am not as hostile to x86 as some people. You can say what you will about the elegance of the architecture, but in certain caes, it's just plain friendlier than others to the OS programmer. The VM model is relatively simple, it doesn't do weird stuff with memory-mapped I/O, it jumps through tons of hoops internally to keep interrupt semantics simple, etc. Once you wrap your head around segmentation (which is set it and forget it nowadays) it's pretty smooth sailing.
A deep unwavering belief is a sure sign you're missing something...
Even without having 4GB of memory installed, it is still very useful to have a 64-bit address space. Imagine being able to mmap() your entire hard drive at once! The filesystem would just simply treat the entire disk as a big data structure in virtual memory, copying when needed, instead of having to issue read and write calls to the disk. This will provide a huge performance increase.
AGP and PCI cards, especially newer video cards, are also getting big. These need to have address space allocated to them. Even with a 64-bit PCI card, Linux still surprisingly allocates address space in 32-bit memory (the lower 4GB). If 4GB of RAM is installed, Linux must create a "hole" for PCI cards and such, as there isn't enough address space for all the RAM plus the PCI cards. This reminds me of the bad old days of ISA, where the expansion cards had to sit between 640K and 1M, creating a hole between the first 1M and all later memory. This hole still exists!
And finally, there's lots of good reasons to have a huge address space that provides room enough for everything on the system at once. No need to decode multiple memory maps and translate between them. It would be a boon to things involving virtual memory, multiple programs, data transfer between programs, and so on.
BTW, I use a machine at work with 4GB of memory installed. It's running Linux 2.4. Even with HIGHMEM enabled, it is still a mess, because we need that memory to be available to the kernel and PCI devices, and not just in user space. Linux is very good at doing page table tricks with PAE (Physical Address Extensions) for user programs, but this isn't true in kernel space. I'm looking forward to real 64-bit machines!
Dr. Demento On The 'Net!