Intel Details Upcoming Gulftown Six-Core Processor
MojoKid writes "With the International Solid-State Circuits Conference less than a week away, Intel has released additional details on its upcoming hexa-core desktop CPU, next gen mobile, and dual-core Westmere processors. Much of the dual-core data was revealed last month when Intel unveiled their Clarkdale architecture. However, when Intel set its internal goals for what its calling Westmere 6C, the company aimed to boost both core and cache count by 50 percent without increasing the processor's thermal envelope. Westmere 6C (codename Gulftown) is a native six-core chip. Intel has crammed 1.17 billion transistors into a die that's approximately 240mm sq. The new chip carries 12MB up L3 (up from Nehalem's 8MB) and a TDP of 130W at 3.33GHz. In addition, Intel has built in AES encryption instruction decode support as well as a number of improvements to Gulftown's power consumption, especially in idle sleep states."
Can most programmes really be written to take advantage of so many cores?
Yup.
Got a Core i7-920 running at 3.2GHz at home - OS is 64-bit Kubuntu 9.10.
Yesterday I had five two-hour videos I wanted to render to DVD5 format - four were .avi and one was .mp4.
Launched five instances of DeVeDe to render the video and create the DVD file structure and did all five at the same time - then left for work. Took an hour and twelve minutes and the machine didn't melt, explode or let any of the magic smoke out of the box.
Even if an application isn't multithreaded the OS is - so even running a single task a multicore processor will give you a performance boost.
A Core i7 has four cores that'll run two threads each - presents as eight processor cores to the OS. I have no problem using them all ;-)
we see things not as as they are, but as we are.
-- anais nin
Porting libdispatch requires a generic event delivery framework, where the userspace process can wait for a variety of different types of event (signals, I/O, timers). On Darwin, Apple used the kqueue() mechanism that was ported from FreeBSD, so it's quite easy to port the code to FreeBSD (just #ifdef the bits that deal with Mach messages appearing on the queue). Kqueue is also ported to NetBSD and OpenBSD, so porting it to these systems will be easy too.
Solaris and Windows both have completion ports, which provide the same functionality but with different interfaces. Porting to Solaris would require replacing the kqueue stuff with completion port stuff. Porting to Windows would ideally also require replacing the pthread stuff with win32 thread calls. Even Symbian has a nice event delivery framework that could be used, although I'm not sure what the pthread implementation is like in the Symbian POSIX layer.
Linux is the odd system out. All different types of kernel events are delivered to userspace via different mechanisms, so it's really hairy trying to block waiting until the next kernel event. This also makes it harder to write low-power Linux apps, because your app can't spend so long sleeping and so the kernel can't spend so much time with the CPU in standby mode.
If you don't need the event input stuff (which, to be honest, you do; it's really nice), you can use toydispatch, which is a reimplementation that I wrote of the core workqueue model using just portable pthread stuff.
It also adds some pthread extensions for determining the optimal number of threads per workqueue (or workqueues per thread, depending on the number of cores and the load), but these are not required. The FreeBSD 8.0 port doesn't have them; they were added with FreeBSD 8.1.
I am TheRaven on Soylent News
What?
AES acceleration will be useful for VPNs, serving SSL websites, VoIP, full disk encryption ... and so on.
Why put AES on-board?
They're not: they're putting extra instructions on-board which help implement AES more efficiently. They may also allow you to implement other algorithms more efficiently, though I haven't looked at them in enough detail to be sure.
I thought AES was relatively fast as encryption algorithms go.
That still doesn't make it fast at an absolute level. Particularly when you're doing full-disk encryption with user account encryption on top and IPSEC on all your network connections.
Most of the things that you do on a computer will run happily on a 1GHz CPU and still not bring usage over 50% more than occasionally
Speak for yourself.
It's official. Most of you are morons.
Why put AES on-board?
They're not: they're putting extra instructions on-board which help implement AES more efficiently. They may also allow you to implement other algorithms more efficiently, though I haven't looked at them in enough detail to be sure.
The instructions perform a single round of AES (which has 10-14 rounds depending on key size), either encrypting or decrypting. Certain other algorithms such as Lex, Camellia, Fugue and Grostl use AES S-boxes in their core, and can probably benefit from these instructions. However, they will not achieve nearly so much a speedup as AES.
The AES instructions themselves will approximately double the speed of sequential AES computations. This is very unimpressive; VIA's AES instructions are much faster. They will also make it resistant to cache-timing attacks without losing speed, which is unimpressive because you can already do this on Penryn and Nehalem. The low speed results from the AES instructions having latency 6; if you can use a parallel mode (GCM, OCB, PMAC, or CBC-decrypt, for example) then the performance should be 10-12x the fastest current libraries. Hopefully, this will cause people to stop using CBC mode, but perhaps I'm too optimistic.
Intel also added an instruction called PCLMULQDQ which does polynomial multiplication over F_2. If it's fast (I can't find timing numbers, but hopefully it's something like latency 2 and throughput 1) then it will be very useful for cryptography in general, speeding up certain operations by an order of magnitude or more. This is more exciting to me than the AES stuff, because it might enable faster, simpler elliptic-curve crypto and similarly simpler message authentication codes. Unfortunately, these operations are still slow on other processors, so cryptographers will be hesitant to use them until similar instructions become standard. If the guy you're communicating with has to do 10x the work so that you can do half the work... well, I guess it's still a win if you're the server.
I thought AES was relatively fast as encryption algorithms go.
That still doesn't make it fast at an absolute level. Particularly when you're doing full-disk encryption with user account encryption on top and IPSEC on all your network connections.
AES is fast for a block cipher, but modern stream ciphers such as Salsa20/12, Rabbit, HC and SOSEMANUK are about 3-4x faster. (In other words, they are still faster than AES in a sequential mode on Westmere.) AES is still competitive, though, if you can use OCB mode to encrypt and integrity-protect the data at the same time.
The fastest previous Intel processor with cutting-edge libraries in the most favorable mode could probably encrypt or decrypt 500MB/s/core at 3-3.5GHz. This is fast enough for most purposes, but in real life with ordinary libraries you'd probably get a third of that. So this will significantly improve disk and network encryption if they use a favorable cipher mode.
Cred: I am a cryptographer, and I wrote what is currently the fastest sequential AES library for Penryn and Nehalem processors. But the calculations above are back-of-the-envelope, so don't depend on them.
I hereby place the above post in the public domain.