Alpha 21364 EV7 Specs Released
Jon Carroll writes " HP has revealed their Alpha roadmap
today at RDF and the schedule goes
as previously planned. Alpha 21364 (EV7) is based on 0.18 micron to be shipped
by this year end and EV79 based on 0.13 micron SOI will be up next. EV7 will be
at 1.2Ghz while EV79 will be at 1.6Ghz. The Alpha 21364 EV7 chip will have 152M
transistors, 1.75MB integrated on-die L2 cache, 32GB/s of network bandwidth,
integrated RDRAM memory controller with 8 channels up to 12.8GB/s of memory
bandwidth. "
I used to use Alpha's but left the platform 3 years ago because of lack of progress in the development of the Alpha. Especially now Compaq is dead too, the Alpha is a sitting duck. HP already has PA-Risc and and a very good relationship with Intel and their Itanium chip. Too bad!
It is an EIGHT channel RDRAM controller though. Compare to the TWO channel RDRAM controller of the i850 for example. That gives the Alpha 4x the memory bandwidth of the i850. RAMBUS and DDR both have their advantages and disadvantages. I doubt that RDRAM would have been used without a good reason - most likely the need for high memory bandwidth. Graham
Your sketch was more or less right on. When Compaq sold ALPHA to Intel, they said there would only be one more ALPHA chip. Damn them to hell anyway. ALPHA was the best.
Sure RDRAM is 'slow' when used on PC architecture however on an Alpha which has VERY WIDE memory bus it can actually use all that memory bandwidth. The latency doesnt matter anymore. As for cost. If you are buying one of these you probably had to get the job done 'yesterday' :-)
Peter
www.alphalinux.org
After HP anouncement that Alpha is a dead end, this is of no relevance... SADDDLY!!
b .h tm
http://www.hp.com/hpinfo/newsroom/press/07may02
They are dropping Alpha and PA-RISC for Itanium... baaadddd move!!
the latency on it sucks balls
It does in a PC, where they only put two 16-bit channels so you need two accesses to each bank to fetch the 64-bit bus-width (it's serialization).
In Alpha, there's no serialization. You've got an eight-channel (16 bit each, unless they use the newer 32-bit wide?) configuration. That means that they are 128 bits wide. In order to get the same performance from DDR, you'd need to have a bus that's 1024-bit wide or something like that, which is not practical...
I don't like RAMBUS at all, but the industry has to come up with something faster because it's clearly the fastest on platforms where it's used correctly (I don't include the current PC in that category).
Opus: the Swiss army knife of audio codec
They have been available for the compaq testdrive project for a couple of weeks
cpu Alpha
cpu model EV7
system variation Marvel/EV7
cycle frequency 800000000
BogoMIPS 2140.20
platform string Compaq AlphaServer ES80 7/800
cpus detected 2
cpus active 2
This has been restructured a bit to pass through the junk filter as well as condense it to the most important info.
Ordo Militum Unix.
You must be buying cheap servers. RDRAM is used in more expensive servers, in part due to the high bandwidth it provides (and also, in part due to engineering decisions made years ago.) 8 channels of RDRAM yields 12.8 GB/sec of memory bandwidth which is certainly more than you get with PCs these days, even PC servers. Then again, the 21364 isn't shipping yet. But I don't think Intel plans on shipping that sort of CPU bandwidth by the end of the year.
And back to your point about economics of RDRAM, there is money out there that will pay a premium for performance scalability (at least when combined with reliability). About 11 percent of all servers -- command as much as 60 percent of all server revenue.
I just wonder how it'll stack up performance-wise on this chart versus Power4 and Itanium2.
But the main reason I suspect one would buy one of these is because you want binary compatibility with all your old high-performance Alpha code that you invested so many man-years in.
--LP
This is not loop unrolling, it's a technique called tiling. The idea is that accesses to your rectangular array are performed in small square sections. This optimizes cache usage during the transform, where sequential access in 1 of the 2 dimensions would otherwise be cache-unfriendly.
No, this isn't loop unrolling at all. This library (and not the compiler, note) is using this scheme to maintain cache-locality. A general rule of optimization is to agressively utilize the memory heirarchy, be it at the L1/L2 cache level, VM, etc. This means maintaining good data-locality in the algorithm's access patterns at the relevant scales (i.e. cache, VM pages, etc). Failure to manage this (for this example) means a performance hit due to greatly increased cache misses, often in the form of unecessary loading, dirtying, flushing, reloading and redirtying cache lines continuously during the course of processing. Ideally, one wants to load the cache line once, do all work in the cache, then flush/write back and move on to other data.
This principle can be seen in how the GIMP stores image data in tiles data for rapid processing, in matrix math libraries, in the design of FFTW (The Fastest Fourier Transform in the West, www.fftw.org), and many other systems.
In reality NT does have some VMS like feataures in the kernel, but it is *not* VMS. If it was it would be a little slower and a BSOD would be strictly mythological.