Princeton Researchers Announce Open Source 25-Core Processor (pcworld.com)
An anonymous reader writes: Researchers at Princeton announced at Hot Chips this week their 25-core Piton Processor. The processor was designed specifically to increase data center efficiency with novel architecture features enabling over 8,000 of these processors to be connected together to build a system with over 200,000 cores. Fabricated on IBM's 32nm process and with over 460 million transistors, Piton is one of the largest and most complex academic processors every built. The Princeton team has opened their design up and released all of the chip source code, tests, and infrastructure as open source in the OpenPiton project, enabling others to build scalable, manycore processors with potentially thousands of cores.
Relax. In between architectural basis and the relatively low performance, it's insignificant. A few hundred million transistors for a 25 core chip in a day where your stock chip is multibillion in terms of transistor count.
the type of cores:
Some of OpenPiton® features are listed below:
OpenSPARC T1 Cores (SPARC V9)
Written in Verilog HDL
Scalable up to 1/2 Billion Cores
Large Test Suite (>8000 tests)
Single Tile FPGA (Xilinx ML605) Prototype
The bit that may put some people off:
This work was partially supported by the NSF under Grants No. CCF-1217553, CCF-1453112, and CCF-1438980, AFOSR under Grant No. FA9550-14-1-0148, and DARPA under Grants No. N66001-14-1-4040 and HR0011-13-2-0005. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of our sponsors.
So interesting and possibly FGPA synthesizable test processor it may be. Trustworthy computer core it may *NOT* be. (You would have to compare it to the original T1 cores, and have had those independently audited to ensure no nefarious timing attacks, etc were in place.)
Now, having said that, if this interconnect is even a fraction as good as they claim, it could make for an AWESOME libre SPARC implementation competitive with Intel/AMD for non-Wintel computing uses. Bonus for someone taping out an AM3+ socket chip (or AM4 if all the signed firmware is SoC-side and not motherboard/southbridge side.) that can be initialized on a commercially available board with standard expansion hardware. AM3/3+ would offer both IGP and discrete graphics options if a chip could be spun out by middle of 2017, and if AMD was convinced to continue manufacturing their AM3 chipset lines we could have 'libreboot/os' systems for everything except certain hardwares initialization blobs. IOMMUv1 support on the 9x0(!960) chipsets could handle most of the untrustworthy hardware in a sandbox as well, although you would lose out on HSA/XeonPhi support due to the lack of 64 bit BARs and memory ranges.
With a multiuser, multitasking OS you can have 25 different unrelated processes running on something with 25 cores. Or you could have 25 threads in a dataflow arrangement where each is a consumer of what the last just produced. Or you could go over the members of an array or matrix 25 members at a time with the same transformation. Some things are serial, but there are plenty of ways more cores can actually be used.
Nope. You'll generally hit the wall with around 16-20 cores using shared memory. You need distinct processors with dedicated memory to make multi-processing scale beyond 20 or so processors. Those huge servers with 32-cores apiece have their point of dminishing returns/processor after around 20 cores.
First, the reason you aren't going to be doing multithreading/shared-memory on any known computer architectures, read this.
Secondly, let's say you aren't multithreading so you don't run into the problems in the link I posted above. Let's assume you run 25 separate tasks. You still run into the same problem, but at a lower level. The shared-memory is the throttle, because the memory only has a single bus. So you have 1000 cores. Each time an instruction has to be fetched[1] for one of those processors it needs exclusive access to those address lines that go to the memory. The odds of a core getting access to memory is roughly 1/n (n=number of cores/processors).
On a 8-core machine, a processor will be placed into a wait queue roughly 7 out of 8 times that it needs access. Further, The expected length of time in the queue is (1-(1/8)). This is of course, for an 8-core system. Adding more cores results in the waiting time increasing asymptotically towards infinity.
So, no. More cores sharing the same memory is not the answer. More cores with private memory is the answer but we don't have any operating system that can actually take advantage of that.
A project that I am eyeing for next year is putting together a system that can effectively spread out the operating system over multiple physical memorys. While I do not think that this is feasible, it's actually not a bad hobby to tinker with :-)
[1] Even though they'd be fetched in blocks, they still need to be fetched; a single incorrect speculative path will invalidate the entire cache.
I'm a minority race. Save your vitriol for white people.
No, it's ok. You have to shit *and* piss his pants. It's two-factor authorization.
I've fallen off your lawn, and I can't get up.