IBM's Eight-Core, 4-GHz Power7 Chip
pacopico writes "The first details on IBM's upcoming Power7 chip have emerged. The Register is reporting that IBM will ship an eight-core chip running at 4.0 GHz. The chip will support four threads per core and fit into some huge systems. For example, University of Illinois is going to house a 300,000-core machine that can hit 10 petaflops. It'll have 620 TB of memory and support 5 PB/s of memory bandwidth. Optical interconnects anyone?"
Comment removed based on user account deletion
In other news, temperatures on the University of Illinois campus have mysteriously risen ten degrees. Scientists are still examining possible causes..
"For example, University of Illinois is going to house a 300,000-core machine that can hit 10 petaflops. It'll have 620 TB of memory and support 5 PB/s of memory bandwidth."
I came.
Yes, but I think you'll still have to disable the aero if you want to get DNF to work right.
Caveat Utilitor
When can I get an iphone with it ?
G
I'd be a lot more excited about these PPC lines if Ubuntu 8.04 would install and run properly on the PS3, whose PPC+6xDSP architecture would be a great entry level platform for coming up with parallel techniques for the bigger and more parallel PPC chips.
--
make install -not war
The applications that are going to be run on this type of machine are designed to be run on this kind of machine.
If your process looks like this:
int main()
{
while (something)
{
doSometing();
}
}
It will hit 100% on one core and that's it. Its not multithreaded - one CPU will churn on it forever and the others will sit around waiting for a task from the OS. 2 course, 200,000 cores the results will be the same. These machines are made for tasks that are broken up into lots of smaller jobs and processed individually. Its not magic - more cores won't get a single threaded process done faster.
Seriously.
Yeah, right, it has optical connections. they will have to be disabled to play videos, otherwise you might copy them!
What are we going to do tonight Brain?
Chances are IBM will still have a problem supplying them, plus new game consoles will get a priority in shipping in 2010, when that XBox 720 or Playstation 4 comes out.
It is also possible that the eight core chip will be really expensive, and in order to keep up with it a PowerMac would cost $4000 or more just to eliminate bottlenecks and use optical technology like super computers use to be able to use the chip properly. Not to say that nothing stops Apple from bringing out PowerPC based Macs in 2010 as Mac OSX already runs on PowerPC code and would have to be modified to run on the Power7 instruction set. Which is very doable. Apple could have Intel Macs for low cost systems for home and small businesses, and Power7 Macs for high end workstations and servers for middle to large businesses. I don't see why Apple couldn't bring back PowerMacs and sell them next to Intel Macs, unless IBM starts to have production problems again and can't supply Apple the number of PowerPC chips that they need?
Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
So you can get 16 cores in a low end box but it still won't have enough I/O slots so you will have to buy a shelf at $obscene_amount, seriously why does IBM put such few I/O slots in the lower end P series boxes?
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
So still no G5 laptop???? AHHHHHH!!!!!!
It should be noted that previous POWER architectures had 2 threads per core. They also had SMT ( Simultaneous Multi-Threading ) support, which gave them an "effective" 4 threads per core. I wonder. Are the all the threads on the POWER7 "true" threads ( ie. 4 execution units -- 1 per thread ) or is it a 2 thread setup with SMT? On the other hand, if the POWER7 really does have 4 "true" threads, then with SMT you'd get an "effective" *8* threads per core.
jdb2
And the reason that it kind've oscillates between cores is because "Set Affinity" tells the process that it's allowed to use that core, not that it has to or even should. If you want something to use both cores, open up two processes, set the first to core 1 and the second to core 2. Most of the time that's unusable like that, but I recently transcoded my entire music library and set one process to do songs from A-M, and the other from N-Z. It really helped
It's not that simple. While one single task generally is not coded to take advantage of the entire system (single threaded on a dual system, dual thread on a quad system, whatever), you are able to actually use your computer while said task is underway. Ever encoded a DVD on a single core machine? Not so fun - half the time, you can't even use your mouse. Slap the same task on a dual-core box, and suddenly you can continue to work (or play) while that goes on in the background. Alternately, you can encode two DVDs simultaneously and be done in the speed it would normally take to finish one. Parallelism in its most literal sense.
Of course, many video-related apps these days are multi-threaded, but you get the general idea.
How are sites slashdotted when nobody reads TFAs?
The funny thing is that it teeter-totters back and forth from one core to the other. I wish I knew what made it do that.
The OS runs the process a few milliseconds at a time, then kicks the process of the cpu for another process to run (if there is one, including OS tasks such as I/O routines). When the OS starts up the process again for a few more milliseconds, it may start it up on a different core. That is why both cores will show 50% average utilization.
Now if you set CPU affinity for that process to be on one core, then it will max that core out at 100% and the other core will be idle. This may result in better performance, because you get better cache utilization if the process stays on the same core.
On a related topic, this can also be the case if the app is multithreaded -- sometimes it is more efficient to run multiple threads on the same CPU instead of across CPUs, if each thread is accessing the same region of memory. Otherwise, if the threads are on different CPUs or cores, then the threads are constantly invalidating the cache on the other core, causing more (expensive) reads/writes to main memory.
Look at the heatsink in a PS3 and you have your answer.
It'll have 620 TB of memory and support 5 PB/s
Is that kind of memory bandwidth possible? You could access the entire 620TB in ~120 milliseconds. I guess nothing is ever to fast, it just seems unrealistically fast.
If you could reason with religious people, there would be no religious people
No, each core is running at 4Ghz. That does not total up to 16 Ghz processing power though, because only multithreaded programs can take advantage of more than one core at once, and they still have to wait if they're sharing data.
Aren't a lot of games and apps single-threaded? Hmmm. I figured that dual/quad-core wasn't all it's cracked up to be. So, essentially, if I have a single-threaded app on a quad core, it'll perform at 1/4th the potential speed.
Yes, although, most high end games and game engines actually are multi-threaded. Few are designed to take advantage of more than 2 cores though, and none that I know of will use 8 or 300,000...
So, essentially, if I have a single-threaded app on a quad core, it'll perform at 1/4th the potential speed.
Not necessarily. If you have 3 women can you make a baby in 3 months instead of 9? Given that it still takes 9 months and 2 of the women are idle, would you say that these women are performing at 1/3rd the potential speed? Same sort of logic applies here. If the task is inherently sequential, having more cores (or ladies) won't make it any faster.
Somethings -are- highly parellizable, like ray-tracing or cutting down all the trees in a forest.. and other things are partly parallelizable... like changing tires (a pit crew can change 4 tires at once... but adding more staff to allow you to change 5 tires at once doesn't make your team any faster...)
That doesn't leave me with a warm and fuzzy feeling inside.
Yes, in general computing applications, an 8GHz CPU would be faster than a quad core 2GHz. (And even under optimal parallilizable situations the 2ghz quadcore would just barely surpass the 8ghz cpu due to lower task switching overhead.) So the faster single cpu is almost always better. The reason we have quad core 2Ghz cpus is that they are much much more practical to actually make, and a lot of the stuff that takes a long time (rendering 3d, encoding movies, etc is actually highly parellizable so we do see a benefit. And much of the single threaded sequential stuff we see is waiting on hard drive performance, network bandedith, or user input... so cpu isn't the bottleneck there anyway.
The funny thing is that it teeter-totters back and forth from one core to the other. I wish I knew what made it do that.
If you look at task manager, there what? some 40+ processes running. The OS rotates them onto an off of the 2 cores based on what they all need in terms of cpu time. So your 'cpu heavy task' gets pulled off a core to give another task a timeslice, and then once its off, it can be scheduled back onto either core. Ideally should stay on one core to maximize level one cache hits, etc, but if its been off the core long enough for the other processes to cache all new memory it doesn't really matter which one it gets assigned to, and in any case flipping from one to the other every now and then makes a almost immeasurably small performance difference.
btw - the 'set processor affinity' feature tells the OS that you really want this process to run on a given cpu/core, instead of hopping around. But in most cases its not something one needs (or gains any benefit) from doing.
Actually, it was the other way around.
Intel chips outperform the PowerPC cpus without a doubt. PowerPC cpus were horrible. The first MacBook pros with Intel chips were 2-3 times faster than the ones before with PowerPC chips. If anything, it was a good move for Apple to start using Intel. I'm not a huge Mac Fan. I own one Apple product, a Nano with RockBox currently on it. However, I do hate when people don't do their fact checking and simply want to troll about a company they hate without justification.
Or, the other thing I like about dual core and up boxes is that they appear to be more stable, back on a single core machine, when a process really wanted to lock up, in some mysterious while(1); loop it could be a real try of patience to kill the app. On a dual core machine no worries, still got the other core to save yourself with :)
Automation - The Car Company Tycoon Game
I remember something from that time that suggested it was simply a supply issue - AMD weren't big enough to guarantee supply. I remember looking at the figures and being surprised (about the capacity of AMD).
I also remember Jobs saying Intel had shown him _very_ exciting things, hint hint. And they were too.
lemonade was a popular drink and it still is
Surely no one would ever need more...
"Aren't a lot of games and apps single-threaded?"
And that's one more thing we can thank Microsoft for.
Hadn't DOS and the PC-clones crippled with mono-processor/mono-threading DOS/Windows stack become the dominant architecture for most of the 90s, we would have rock-solid, secure, multi-processor, 64-bit RISC boxes running some flavor of Unix on our desktops by now.
Thanks Bill.
http://www.dieblinkenlights.com
That University of Illinois machine sounds like it needs more memory.
Only 620TB? Why not bump it up to 640? That should be enough for anybody.
The opinions expressed here are those of this individual, and may not reflect the policy or practice of the collective
Single threading, like used on old versions of Windows, does have some advantages. It avoids a lot of concurrency related problems that most programmers are not properly trained to deal with. If everyone follow the rules, it's efficient and performs well.
I was recently reading a paper on multi-core processors and the future of programming. It pointed out that many multi-threaded programs work fine until they are run on a real multi-processor system where multiple threads can actually run simultaneously. At that point, strange timing-related bugs often appear that are very difficult to replicate and diagnose.
Mea navis aericumbens anguillis abundat
A couple things:
x86 chips today are 99% RISC-like (the term RISC is rarely uses today, since basically no modern CPUs are "pure RISC" in design (reduced as in not even having a multiply instruction, like older SPARCs). Sure the exposed architecture is ugly x86, but that's the compiler's job to worry about, not yours. It doesn't really affect the chip performance. Don't forget x86 chips are still the fastest out there, despite the weird interface)
Also, for Joe Sixpack, 64-bit is pointless - especially when the 32-bit version works on the same OS! If you recompile a program as 64-bit (and often that is all there is to it; a recompile), you'll notice that the binaries are larger. In fact, most pointers (memory addresses) now takes up twice the space, so your program also uses more memory. The benefit? Unless your app uses more than around 3GB of RAM, basically zero (On x86 there is a sometimes a slight performance benefit, not because 64-bits is "faster" or anything, but because AMD added some more registers to the x86-64 spec).
Anyhow, i generally view 64-bits as a waste of address space, UNLESS you're accessing large amounts of memory (>3GB per program!). This will be more of a concern in the next few years, but there isn't any rush. I use 64-bit Vista for development (Because I have 4GB of RAM) but otherwise probably wouldn't care. Even Visual Studio (the dev platform for 64-bit code) is mostly a 32-bit app, nor should they change it.
Jeremy
History is absolutely full of people who don't follow the mainstream theory or have financial backing and end up creating the next mainstream theory which receives all of the financial backing.
History is also full of people such yourself, AC, who poor scorn on non-conformist ways of looking at things and end up looking like fools.
Maybe he has a point, maybe he hasn't, but whether or not he is in the mainstream has little or no bearing on the validity of his thought.
I don't therefore I'm not.
iometer
Properly configured it can stress all the cores on all the nodes in your cluster.
Oh you wanted to do something useful...
Intel released it as open source in 2001. Edit the source for the dynamo so that it does something useful. Compile and install. Done.
Or you could load Vista and play a light game. That ought to peg both cores.
Actually, dual core is what it's cracked up to be. While your single threaded application is grinding away you can still interact with your computer instead of staring at the hourglass like you used to do. Since you like playing with the affinity you can launch several long single threaded tasks and set their affinity for different cores. Transcode a .AVI into a DVD of the family picnic? Render an animation in POVRay? Compute a few billion prime numbers. Fold some proteins. Calculate the propagation of thermal energy through single fibers in a carbon-fiber fabric. Whatever you want.
Soon almost all non-trivial applications will be multithreaded, and then you'll be cursing the hourglass again. Until then enjoy your vacation from its tyranny.
Help stamp out iliturcy.
The reason people troll is that Apple fanboys were telling us PPC was much faster than Intel right up to the switch, at which point they started telling us they were much slower.
It's like Big Brother fanboys telling you that they have always been at war with Eurasia one day and the very next day that Eurasia has always been their ally. This sort of thing invites trolling and/or rocket bombs.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Right, except it's not always just I/O. I'm not much Windows fan but (XP at least) can be efficient. It is the bad application - I designed a comm. subsystem, queuing, en/decryption, image translation, key management, etc, tested it in 1,2,4,8 core systems (emulating the application) and could drive all cores to 100% busy, almost linear throughput increase. Now - add an application to top of that - 1-2 cores, 20%, 2-4 cores none and 8 cores -%10 throughput?
Took three months to fight the application developers (and they still don't get it?) - total misuse of threadpools in C#! And they were supposed to be the C#/.NET specialist - I'm just an OS guy (mainly MVS/Unix/Linux?) And I had a very good team writing the services for that subsystem but no saying anything about the application design?
The problem I see is that Windows is so much easier to write bad applications - the subsystem actually (excluding auditing) runs under Wine in Linux and, just for fun, I tested it. Same results, very near same throughput.
And please, if running in Intel, test your hyper-threading - not good for everything!
Is there any way to force a process to run over 2 cores at 100%?
Sure there is. Just install Oracle Database Server on it and hit it with some poorly-written queries over an ODBC connection.
Nah. If something gets warmer it is caused by Global Warming and the solution is to eliminate Western industrial civilization.
If something gets colder it is Global Climate Change and the solution is to eliminate Western industrial civilization.
Not a contradiction, even though it seems like one.
Study the bifurcation diagram. As you drive the system harder by turning up R (which may be analogous to global warming - i.e. more available heat energy might be described this way) notice how the system follows R, then suddenly begins oscillating between two extremes. Keep on driving R harder and it breaks into chaos.
The weather IMHO has a lot in common with the logistic map equation. It's present behavior is dependent on it's past state, it's swings are driven by the energy input to the system, etc.
I know it's a gross oversimplification, but so is a mass falling through a uniform gravitational field with no wind resistance and so on. It's still useful to think about.
Weaselmancer
rediculous.
True. The entire purpose of new versions of Windows is to make people buy new computers.
The PowerPC cpu's were not horrible. I've seen benchmarks over the years showing them outperforming intel cpu's (of the same generation) for some tasks (not all, some). The new architecture for Intel is definately impressive and Apple absolutely made the correct choice.
IBM continues to be the king of the hill at server processors like POWER5,6 and probably 7, but these are targeted at a different market than Apple's customers, and are not the same as the PowerPC cpu's.
A lot of application and games writers are complaining bitterly about the move to multi-core processing, as it does mean you need to change the way the code is written to take advantage of it.
I write stuff that runs on big UNIX boxes that has been necessarily multi-process for a long time. It's just a matter of finding things that can be done independantly and then explicitly putting them in their own process.
Ideally languages and compilers will do this at some point but so far mainstream languages do not. Also when you're doing desktop GUI apps it's often tricky to do a good job of multi-processing, and the GUI toolkits don't yet do much to help.
The benefit? Unless your app uses more than around 3GB of RAM, basically zero
Plenty of things quite enjoy being able to perform operations in 64bits at a time, actually. Especially when it comes to media, crypto, compression, and indeed games; on top of having 2-3x the usable number of general purpose registers, which certainly isn't something to sneer at given how awful x86 has traditionally been in this area. Plenty of things you're likely to actually care about the performance of are likely to get a nice boost.
64-bits as a waste of address space, UNLESS you're accessing large amounts of memory (>3GB per program!)
Well, you generally only get 3GB when you've performed tricks to ask the OS to allow that; e.g. /3GB boot flag, fiddling with MAXDSIZ, or recompiling with a different user/kernel space split.
On top of that, it's not all about RAM, it's about address space; if you've only got 32 bits to play with, you need to be very careful about allocating it, since any wastage can lead to exhausting your virtual address space before your physical space; like with filesystems, fragmentation becomes more of a concern the closer you get to your maximum capacity.
Large virtual spaces are also useful when it comes to doing things like mmapping large files; for instance, a database might like to mmap data files to avoid unnecessary copying you get with read()/write(), but mmapping a 1GB file means you need 1GB of address space, even if you don't touch any of it. When it's common to access disk using memory addresses, 3GB starts looking small very fast.
You also very quickly eat into it using modern graphics cards; 512MB is common, having two isn't that uncommon, and things are moving towards 1GB; bang goes your 3GB, all that frame buffer needs addressing too, on top of the kernel's other needs.
Really, 32bit needs to die screaming, sooner rather than later.
Funny is people actually thought IBM can't deliver 3 Ghz or cold running G5. No, they just chose not to deliver it to Apple. Their focus is enterprise, servers, massive scientific computing. The early warning came when they sold their superbly prestigious and brand advertising Laptop division to Lenovo.
Just imagine they cancel this CPU to deliver 3 Ghz G5 to Apple. For what? Apple fans turned x86 fanatics almost overnight happily buying parallels to run Windows applications on OS X and buying overpriced Windows games which are masked as OS X applications.
At least IBM and Apple took away the "endian" excuse in hands of developers and GPU vendors. They still, shamelessly sell 20-30% more expensive graphics cards to Mac users, running Intel, on standard PCI-X mainboard! New excuse is... EFI!
If shaved windows is anything like shaved pussy, it'll be a good thing.
Do you even lift?
These aren't the 'roids you're looking for.
Besides the increased number of general purpose registers on x86-64, there's also the change in calling convention -- on 32-bit x86, function arguments are pushed onto the stack, whereas on x86-64, the arguments are passed via register. That's another reason that apps like Photoshop run faster when compiled as 64-bit x86 code.