Alpha 21364 EV7 Specs Released
Jon Carroll writes " HP has revealed their Alpha roadmap
today at RDF and the schedule goes
as previously planned. Alpha 21364 (EV7) is based on 0.18 micron to be shipped
by this year end and EV79 based on 0.13 micron SOI will be up next. EV7 will be
at 1.2Ghz while EV79 will be at 1.6Ghz. The Alpha 21364 EV7 chip will have 152M
transistors, 1.75MB integrated on-die L2 cache, 32GB/s of network bandwidth,
integrated RDRAM memory controller with 8 channels up to 12.8GB/s of memory
bandwidth. "
After HP anouncement that Alpha is a dead end, this is of no relevance... SADDDLY!!
b .h tm
http://www.hp.com/hpinfo/newsroom/press/07may02
They are dropping Alpha and PA-RISC for Itanium... baaadddd move!!
the latency on it sucks balls
It does in a PC, where they only put two 16-bit channels so you need two accesses to each bank to fetch the 64-bit bus-width (it's serialization).
In Alpha, there's no serialization. You've got an eight-channel (16 bit each, unless they use the newer 32-bit wide?) configuration. That means that they are 128 bits wide. In order to get the same performance from DDR, you'd need to have a bus that's 1024-bit wide or something like that, which is not practical...
I don't like RAMBUS at all, but the industry has to come up with something faster because it's clearly the fastest on platforms where it's used correctly (I don't include the current PC in that category).
Opus: the Swiss army knife of audio codec
No, this isn't loop unrolling at all. This library (and not the compiler, note) is using this scheme to maintain cache-locality. A general rule of optimization is to agressively utilize the memory heirarchy, be it at the L1/L2 cache level, VM, etc. This means maintaining good data-locality in the algorithm's access patterns at the relevant scales (i.e. cache, VM pages, etc). Failure to manage this (for this example) means a performance hit due to greatly increased cache misses, often in the form of unecessary loading, dirtying, flushing, reloading and redirtying cache lines continuously during the course of processing. Ideally, one wants to load the cache line once, do all work in the cache, then flush/write back and move on to other data.
This principle can be seen in how the GIMP stores image data in tiles data for rapid processing, in matrix math libraries, in the design of FFTW (The Fastest Fourier Transform in the West, www.fftw.org), and many other systems.