Russian E2k CPU at 135 SPECint95 / 350 SPECfp95 ???
"I get MPR. I've got about 7 minutes before I have to catch a bus, but, from the MPR issue itself:
The processor uses EPIC. The Elbrus team has been together for 40 years, originally designing supercomputers for the Soviet defense establishments. "They've developed computers based on superscalar, shared-memory multiprocessing and EPIC techniques long before papers on those subjects appeared in the West". MPR claims that the lack of a good semiconductor Fab has been what was holding them back. MPR says that the claims would be unbelievable except for the credibility of the team.
The X86 and IA64 compatibility rely on binary compilation assisted by emulation hooks, similar to what Transmeta is apparantly doing. Supposedly Dave Ditzel spent several years while at Sun working with the Elbrus team.
The processor exists only as an executable Verilog database. However, the E2K design is based on the Elbrus-3 processor that was fabricated in 1991. The Elbrus-3 was built in an "ancient process", used 15 million transistors in about 3000 LSI and MSI chips, and delivered twice the performance of a Cray Y-MP."
Some more he sent later:
" It is actually quite a long article... 6 pages plus the cover, I'm about two thirds through it. The architecture is in fact pretty stunning, and very similar to the Merced and the SPARC in several ways. It has a 64K, 4-way instruction cache: one i-cache only. It has two identical, synchronously-loaded 8K L1 data caches, and a 256K, 2-way, 4-bank L2 data cache. In addition, it has a 4K array pre-fetch buffer for use in loop overlapping. There are two regions, each with an L1 data cache, a 256-entry register file, and three ALUs. The regions are symmetric except that only one region has a divide function.
A great deal in this processor is left to the compiler, a fact that is demonstrated by the single, 64K i-cache; this will only work if the compiler does its job. Much also depends on the compiler's ability to identify instructions that can be executed in parallel. With an optimal instruction load, the multi-ported caches can provide a potential operand bandwidth of 288 Gbytes/sec at a processor clock of 1.2GHz. Much effort is expended to avoid branching; extensive branch prediction support is provided, and in some cases it will actually just go ahead and execute both sides of a branch to avoid doing the branch at all; with so many parallel execution paths, the cost of doing so is much lower than what would be the cost of branching.
When loops are identified, an effort is made to overlap the loop execution, taking advantage the same mechanism as used for the sliding register windows. The 4K FIFO Array prefetch buffer helps to feed data to the overlapped loop. In loop mode, for perfectly optimal code, the processor can rates as high as 23 operations per cycle.
Much of the processor is designed in standard static CMOS gates, but some of the critical paths through the processor use self-reset gates, which do not have a clock but rather are triggered by the completion of cycles in previous gates. According to MPR, these are estimated by Elbrus to run 10-15% faster than static CMOS gates.
Just a couple more facts about Elbrus: The Elbrus-1 computer was a "...superscaler, RISC, processor with out-of-order execution, speculative execution, and register renaming..." This machine was designed and built... between 1973 and 1979!! They dumped superscaler designs becuase they were too complex for the payoff. The Elbrus-3, built between 1985 and 1991, used "an EPIC-based VILW CPU", implemented as a "16-processor shared memory system"
They started working on the E2K in 1994, and it is now at Verilog RTL stage, with compilers and binary-compilation software written. MPR expresses great doubt that a home will ever be found to build the processor, what with the Russian economy as bad as it is, and most capable semiconductor houses already in the midst of implementing their own designs or just not wanting to compete with Intel."
0 of 106 comments (clear)
No comments match the current filter.