Octopiler to Ease Use of Cell Processor
Sean0michael writes "Ars Technica is running a piece about The Octopiler from IBM. The Octopiler is supposed to be compiler designed to handle the Cell processor (the one inside Sony's PS3). From the article: 'Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip. So Cell has immense performance potential, but if you want to make it programable by mere mortals then you need a compiler that can ingest code written in a high-level language and produce optimized binaries that fit not just a programming model or a microarchitecture, but an entire multiprocessor system.' The article also has several links to some technical information released by IBM."
Wasn't that a James Bond film?
Hire "Real Programmers". You know, the ones that only code in Assembler, and if they can't do it in Assembler then it isn't worth doing.
The higher the technology, the sharper that two-edged sword.
It makes you wonder what the release-titles of the PS3 will be like, if they didn't have a decent compiler untill now. And 'the PS3 is due out in 2006.'
Sound familiar? "All we need to make it work as advertised is a really slick compiler that doesn't actually exist yet..."
ABSURDITY, n.: A statement or belief manifestly inconsistent with one's own opinion.
what you say?
'Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip.
Sadly, there's almost no FPU hardware to speak of: 32-bit single precision floats in hardware; 64-bit double precision floats are [somehow?] implemented in software and bring the chip to its knees.
Why can't someone invent a chip for math geeks? With 128-bit hardware doubles? Are we really that tiny a proportion of the world's population?
Reading this is making me nostalgic for LISP machines and interpreter environments that let programmers really play with the machine instead of abstracting it away. What I'd really like to see is someone who takes all the potential for reconfiguration and parallelism and doesn't hide it away but makes it available.
isn't this a bit of a pipe dream? A compiler that optimizes a program for multiple processors is a nice idea, but how can you foresee worst-case-scenarios that only emerge with human use? Take driving as a very abstract example. You "write" a car. You want it to both accelerate and brake on a dime while still being fuel efficient. Without knowing the driving conditions, city or country, how can you optimize your driving for efficiency?
Is it just me or is it a bad idea to make something that completely breaks most programming paradigms, and requires a special compiler to compile it properly, and *then* use it in a next gen console, due out this year?
Surely it was screaming at them that this isn't something that's meant to be released so soon. I mean, the compiler have 4 tiers of 'optimisation', which is meant for the programmers to set so the compiler doesn't make a mess of their memory-management code if they memory manage correctly, or something like that. What this shows to me is that if IBM can't even get the code behind the compiler to make sense of the Cell's architecture, what chance do we have of programming it?
I'll subscribe to Slashdot when I see a month without a dupe, a typo, or an article the "editors" didn't read.
Microsoft's Todd Proebsting claims that compiler optimization only adds 4% performance per year, based on some back of the envelopes on x86 hardware.
This radical of a change in architecture should at least provide an accelerated growth from introduction through the next several years, which I'm sure will provide added incentive for those involved in compiler optimization -- finally, some real enhancements.
500GB of disk, 5TB of transfer, $5.95/mo
Posit: Parallel processing can solve certain types of problems much faster than serial processing.
Posit: The Cell architecture is highly parallel.
Posit: Most programmers today are good at writing serial, not parallel, code.
Hypothesis: A compiler can be developed that takes serially written programs and auto-transforms them into parallel programs to exploit the benefits of parallelism.
Now comes the research to attempt to validate that hypothesis. Will it succeed? We'll find out in several years. There are likely to be some suprising results, and maybe even a paradigm-shattering breakthrough. Or, this line of research may just peter out. It happens.
All this meant that as the PS2 aged it could 'keep up' because the coders kept getting better and better.
Mere mortals do not write the latest graphics engines. I think there are a lot more tier1 people running around then /. seems to think. They are just to busy to comment here.
All that really matters is wether the launch titles will be 'good' enough. Then the full power of the system can be unleashed over its lifespan.
If your a game company and your faced with the choice of either making just another engine OR spending some money on the kind of people that code for super computers and get an engine that will blow the competition out of the water then it will be a simple choice.
Just because some guy on website finds it hard doesn't mean nobody can do it.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
... can get you only so far. You need to have parallelism in mind when you write the high-level code, otherwise it may end up with needless dependence on serial execution that a compiler may not be able to break, reducing the benefits of such an architecture. It will be interesting to see how well games are suited for concurrent execution. Logically there are lots of computations that can be performed independently (AI, physics) but all of it has inherent interaction with a central data source (the game world).
Always wondered why there is no cooperation between chip makers and even video card companies to make a compiler like this.
SPARCv8(?) and up have quad precision.
I've also implemented a simple double double (represents numbers as an unevaluated sum of two non-overlapping doubles) arithmetic in CL. It was ~25% as fast as doubles (mostly branchless, each op expands into ~2-8 double precision op). That gives an upper-bound on the slowdown ratio for the emulation of doubles with singles.
Try Corewar @ www.koth.org - rec.games.corewar
You were lucky. We had to write our own microinstructions using a 12 bit ALU with no barrel shifter, and then burn them into ROM using a magnifying glass to vaporise the aluminium interconnect. And you had hard disks? We had to hand code on paper tape using a leather punch to make the holes. And we thought we were lucky. Next door, the guys in Alan Turing's department were having to stick together infinite paper tapes for some machine he made in the 30s.
Pining for the fjords
Cells big programming problem goes right down to each SPE: The assembler commands for which cannot actually address main memory! Every time information is read into / out of the 256K "local storage" on each SPE, a dma command must be issued. Now, while this is Cell's greatest asset (Execution continues while seriously slow memory movement occurs), it is also difficult to work with.
Your average C programmer doesn't take architecture into account, and so there's no user indication of whether a variable can be paged to maim memory, if code needs to be fetched, and crucially: how far in advance data can be pre-loaded into the local storage, to avoid the SPE hanging on a memory operation.
I'd guess that this new compiler will try to address these issues, which is suggested by the article.
Nah, it's there. Download it, if you want ;)
Any technology distinguishable from magic, is insufficiently advanced.
Hmm that FA was totally devoid of any real details. As it seems to me, and granted I do not develop on cell processors, and I am not a stickler for the "next big thing", but these things may be interesting. Unfortunately, if they want me to use them I need to know it works for me. I want my existing code to compile with minimal changes so I can test the new platform in the raw. I have the resources to test a few "maybe good may be not" systems a year. What I want to know in short is, If it "could" work well. This means I need to use my existing code base in part (their tier IV). I am happy to optimize in my spare time and if need be, once I know it "could be the thing". If the platform passes that test I'll buy a few more units and make a real go if it. I don't think that the cell processor is to that point yet, too little hardware on sale righ now and no software, and there lies the problem. Open source compiler support would be a big plus, but if the platform is "just that good" I can make an exception.
my $0.02
enjoy... :)
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
Somebody had to code this monstrosity of a compiler, and it wasn't you. Isn't that enough of a reason to believe there's a god?
You have tried to support your argument with faulty reasoning! Go directly to jail; do not pass Go, do not collect $200!
...it will be known far and wide as the "Octopile o' Crap."
In Soviet Russia, Chuck Norris will still kick your ass.
If a CPU needs a special compiler in order to give good performance, it's basically dead; there are simply too many different applications that do binary code generation.
Also, the division into "expert programmer" and "regular programmer" is silly. Most coding is done by people who aren't experts in the cell architecture (or any other architecture). That's not because people are too stupid to do this sort of thing, it's because it's not worth the investment.
If Cell can't deliver top-notch performance with a simple compiler back-end and regular programmers who know how to write decent imperative code, then Cell is going to lose. Hardware designers really need to get over the notion that they can push off all the hard stuff into software. People want hardware that works reliably, predictably,and with a minimum of software complexity.
Maybe CISC wasn't such a bad idea after all--you may get less bang for the buck, but at least you get a predictable bang for the buck.
keyword: "decent"
according to the article, the compiler's still in early stages of development...
--
I'm glad to see some real progress in the processor world. We are so guided by the enterprise market that we've had to support x86 WAY longer than we should have. The cell looks like it has a real chance of becoming the next big advancement. For one, IBM is working heavily with the open source community. This is possibly one of the best things they could have done to help the cell. By doing this, you make open source developers happy and more inclined to port over their applications. One of the hardest things to do in getting a new arch out is getting application support, and they've pretty much guaranteed a modest amount of applications by going open source. The nokia 770 is a pefect example of this. They've supported open source and made available more than enough tools for quick porting of applications, and look at the huge amount available already in the first few months. The nokia 770 probably sets records in how many applications were ported in such a short period of time.
Make the developers happy, and they will port their apps. With large amounts of available applications, the consumers will buy. When the consumers buy, you have a successful new arch.
If an officer ever threatens to taze you, say you have a pacemaker.
It makes you wonder what the release-titles of the PS3 will be like, if they didn't have a decent compiler untill now.
Obviously titles whose programmers earn a hefty salary premium for having Tier II skills (as defined in The Article). The art might not look as "next-gen" as it could because the developer had to reallocate some of the art budget toward programming.
I corresponded with the Sparc designer about this very question, because LabVIEW supports a 128-bit "quad-precision" double for Sparc platforms: I sent some email back and forth with one of the dudes on the Sparc design team, and he said that Sparc's 128-bit quad-precision double is a purely software implementation.
Compare e.g.
That's kind of a weird comparison given the differences in innovation, demonstrated results and company attitudes.
IBM's Cell is a much more radical break from previous chips like Itanium, but the CES demo was reported to be very impressive. IBM has already released the SDK and openly published all specifications. The pace of development has been very rapid and people are predicting the replacement of Intel. The missing piece was a compiler to ease transition. It looks like that's coming along just fine.
The Itanium on the other hand was obsolete on it's launch. Even HP dumped it after killing their own better performing 64 bit processor for it and spending billions of dollars and ten years building it.
We can only wonder how things would have been if Intel had opened things up like IBM has, instead of making it so people have to figure things out on their own.
Friends don't help friends install M$ junk.
"If a CPU needs a special compiler in order to give good performance, it's basically dead; there are simply too many different applications that do binary code generation."
Do you mean like the pentium 4? AFAIK it was quite succesfull.
Is it just me or is it that we went from cisc to risc and now going back to risc again?
I assumed less complex chips with optimizations coming from compile time were more efficient or cost effective?
http://saveie6.com/
What benefit does increasing the precision of floats to 128bits bring? 64bits are more than enough for 99.9999% and the remaining cases can be handled in sw emulation. You can still not solve (without massive growth of the error terms) an equation system described by a Hilbert-matrix using Gaussean-elimination no matter how many bits you make the mantissa.
Check out some of Professor Kahan's shiznat at UC-Berkeley:
In particular, look at the pictures of "Borda's Mouthpiece" [page 13] or "Joukowski's Aerofoil" [page 14] in the following PDF document: As I understand it, the "wrong" pictures are computed using Java's strict 64-bit requirement; the "right" pictures are computed by embedding the 64-bit calculation within Intel/AMD 80-bit extended doubles, performing the calculations in 80-bits worth of hardware, and then rounding back down to 64-bits to present the final answer.MORAL OF THE STORY: Precision matters. You can never have enough of it.
On the PS2, there are two vector units (vu0 and vu1), which are basically where all the grunt work is done - the mips chip is there for housekeeping and non-time-critical code. Each VU has 2 code-paths (the instruction word is 64-bit, and there are two 32-bit instructions in each word). There are limitations on what you can do in each of the two words simultaneously. Sony have a GUI tool (in their professional kit) which allows the programmer to write essentially sequential code, and have it take full advantage of the vector units. According to Sony, it performs as well as a skilled programmer.
For the linux kit, they only released vcl (a commandline version). It's a bit like a compiler-stage. It takes sequential assembly language for a single VU and re-orders code, inserts wait-states etc. Finally producing another assembly output which is optimised for the dual-issue nature of a VU.
It strikes me that optimising for constraints over 2 code paths in a single unit isn't too far a stretch from optimising for constraints over 8 code paths in 8 units. The differences are mainly to do with locality of reference. On a VU it was up to the programmer to DMA data into scratch-space RAM, and set flags as semaphores on operation. There's no real reason why a computer program can't do that - a basic approach would be to do it on a function-by-function approach, or use #pragma constraints in the code. There's no need to have the all-singing, all-dancing version of the optimiser as version 1...
Simon.
Physicists get Hadrons!
I recall a common complaint by development houses about Sega consoles were that they were very difficult to code for because of hardware complexity. Isn't Sony now making the very same mistake that doomed Sega's console business? Speaking of which, is XB360 easier to code for than PS3?
ELOI, ELOI, LAMA SABACHTHANI!?
"All that really matters is wether the launch titles will be 'good' enough. Then the full power of the system can be unleashed over its lifespan."
Yea, but what's the full power of a system? Prettier graphics?
The "full power" of the PS1 seemed to be that its games became marginally less ugly as time went on, although FF7 was very well done since it didn't use textured polygons for most of it (the shading methods were much sexier). When I think about FF9, I don't like it more because it uses the PS1 at a fuller power level than FF7, I like it better because the story is cuter.
I like PGR2 better than PGR3 because PGR2 has cars I know and love from Initial D and my own experience, whereas PGR3 has super cars I've never driven or seen before.
I don't think Rez taxes the PS2 more than Wild Arms 3, but I like it better than Wild Arms 3. I also like most of the iterations of DDR, and they're not taxing in the slightest.
The full power of a system is not its graphics capability or how easy it is to control or its controller or its games -- it's the entire package. Does the PS3 have a good package? The Xbox 360 sure doesn't -- the controller power-up button is nice, but there is nothing new or interesting; it's a rehash. The PS3 is a rehash too.
The Sega Saturn was a rehash of the 8-bit and 16-bit 2D eras. It died. The PS3 and Xbox 360 are rehashes of the 64-bit and 128-bit 3D gaming eras.
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
Yes, in fact you are a really tiny proportion of the world's population!
SEO Test: TIGI und SEBASTIAN - Online Shop - V
Let me summarize
- take one of the most unsafe, slowest-to-compile, pitfall-ish, unspecified languages in existence (ok, I might be exagerating on the "unspecified" part)
- add even more #pragmas and other half-specified annotations which are going to change the result of a program near invisibly
- don't provide a debugger
- require even more interactions between the programmer and the profiler, just to understand what's going on with his code
- add unguaranteed and slow static analysis
- ...
- lots of money ?
Am I the only one (with Unreal's Tim Sweeney) who thinks that now might be the right time to let C die, or at least return to its assembly-language niche ? I mean, C is a language based on technologies of the 50s 60s (yes, I know, the language itself only came around in the late 60s), and it shows. Since then, the world has seenAnd what are C and C++ programmers stuck with ?
So, now, we hear that IBM is trying to maintain C alive, under perfusion. IBM, please stop. Let granddaddy rest in peace. He had his time of glory, but now, he deserves that rest.
Oh, and just for the record. I program in C/C++ quite often as an open-source developer and my field is distributed computing. But I try to keep these subjects as far away from each other as I can.
(well, venting off feels good)
This troll is over. You can now resume a normal activity.
What I'd really like to see is someone who takes all the potential for reconfiguration and parallelism and doesn't hide it away but makes it available.
It's called threads on the one hand and vector data types on the other. Once you have learned how to use those, you're a tier II developer (as defined in The Article) working with a PowerPC based computer connected through low-latency pipes to seven DSPs, and you can just spawn tasks in threads that die when the tasks finish. Trouble is that a lot of development firms that can only afford the lower salaries of tier III and IV programmers don't want to take the time to adapt a 90 percent finished single-threaded PC game to a highly threaded, vectorized environment.
Probably not true. Consider the yelling and screaming that went on in the late 90s as code had to become 'thread-safe'. Now that fight is mostly over, so you're already on the right track. Next step is take a page from the technical computing market, and generalize 'thread' to 'non-local access', i.e. your thread may be on another proc, with another cache or memory to access. This gets you to dual core, or openMP type systems. (SMP). One more step, and you're at NUMA, where that other core could be another entire computer, with a longer latency. Usable techniques are known (after all, somebody is using BlueGene, and there are codes such as NAMD which run segmented across hundreds of machines), so compilers have to be taught how to do as much of this automatically as possible, while programmers will have to be up to speed on multi-threaded, heirarchical memory access patterns.
The key is whether enough processors can be sold to make this investment of time worthwhile. Advances in Windows (quit yelling) have already driven some of those changes, as can be seen if you compare the behaviour of current programs versus those aimed for 3.1/95, but you haven't noticed it much because those changes are incremental. More tasks run asynchronously, dialogues don't lock the entire window manager while waiting for your response, systems wait until idle periods to do heavy I/O. The proposed Cell compiler is just one step beyond multi-threaded, so the transition will, in the end, be less fuss than is currently anticipated.
Dig into the technical docs for Intel's current Fortran versus its ancestral DEC variants, and you'll see compilers are already doing an amazing amount of work in terms of code reorganization, execution order prediction, etc., that their ancestors didn't. The language the programmer sees is almost identical to the one they saw 20 years ago, and only comes with a few more 'gotchas' to avoid. This has to happen, as the Market has decided that it's cheaper to add cores than design faster ones, so this sort of distributed programming is going to become the norm. You'll look back at simple, imperitive code some day soon and say, "How quaint". From the programmer's view, all that their new, miraculous, octapiler, has to do is take OpenMP statements within a current language, and they can continue working much as they did before.
On that note, it's somewhat heartwarming to envision hordes of recent CS grads, soaked in the latest OO paradigms, being told, "there's great money to be made programming for the Cell, but you're going to do it in High-Performance Fortran."
the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
You engineer programs in a sense similar to cars, yes. But, you interact with your tools on a much higher level than putting in a pedal and a brake pad. I suspect you do in actual car design too: it wouldn't be a huge step to be able to model a car in a 3D app and ask the computer how that shape of car will perform in terms of aerodynamics, gears, engine power and therefore miles per gallon or acceleration etc.
It's similar with programming. Instead of saying, this is a car, and it goes in that world, and we'll see what happens, you also design the world, and the way they interact, and you do it all at as high a level as you can. So, the compiler can see what you're doing at a fairly high level, and ideally, can understand and optimise that. Similarly, if you're doing programming multiple processors/cores with threads, then you use a compiler that understands threads. You tell it when threads can run at full speed, and when they need to stop and catch up with each other. Then, the compiler can hopefully examine what needs to be done and when, and what processors are available to do it on, and optimise accordingly. This is nothing new; lots of compilers/APIs do this sort of thing now in various ways.
What I want to know is... will this just be limited to a single 8-workhorse cell chip, as the name "Octopiler" suggests, or will it use the promised power of Cells, so that a program will spread its workload across all the Cell devices in your home if you have more than one? Somehow I doubt they're there yet.
You neglected "They're - They are."
The greatest thing you'll ever learn is just to love and be loved in return.
As a programmer, there's only so much that can be done in software. Sure you can parallize things, and you can come up with newer/faster algorthms, but if we didn't get dual proc systems, that would have been pointless. So with parallel procs, we get better parallel code. Hardware advances will create software advances, and new algorthms will direct hardware futures. This is the way the world works, and I think it's worked out fairly well so far. Lets see what the Cell and processors after it can do!
-=JML=-
The compiler may have pragma instructions or linker bindings for parallelism, which would be easily taken advantage of by higher-level libraries, even if end-users don't know how to use it (though, imho, they can learn easily enough).
This is possibly one of the best things they could have done to help the cell. By doing this, you make open source developers happy and more inclined to port over their applications.
It's too bad that the only popular commercial implementation of the Cell processor for several years is going to be in a machine with a lockout chip, a technical measure that prohibits end users from compiling Free software on the machine. Otherwise, game developers could develop a Free engine subsidized by keeping game assets (maps, models, textures, audio, scripts) proprietary, and then sell the resulting game without having to pay Sony per title and per copy. Such an open-source business model would break the console business model, which involves taking a loss on R&D and marketing, breaking roughly even on manufacturing the console, and making up the difference in marketing.
these Octopiler coders are doing their work for the love of coding. If they want a salary for this then they're not worth their weight in salt.
[/kfg mode off]
--- Grow a pair, liberals... stop letting the Republicans bully you!
An octoplier, that is. I think they've had them for years.
This radical of a change in architecture
There's nothing "radical" about it--it's just a bunch of CPUs on a chip. It's about the least radical way in which you can put a bunch of CPUs on a chip, beyond multicore.
Pretty much all modern CPUs need special compilers to give good performance. Unless you can keep track of the number of pipeline stages, the degree of superscalar architecture, etc. you will get sub-optimal code. The P4, for example, can have 140 instructions in-flight at once. Can you keep track of your code over a 140 instruction window and make sure there are no hazards? If not, then you're probably better off using a 'special' compiler.
The days when a compiler could just turn each statement into a fixed instruction sequence are long gone.
Maybe CISC wasn't such a bad idea after all--you may get less bang for the buck, but at least you get a predictable bang for the buck.
No, actually, you don't. One of the key features of RISC was that instructions took the same time to execute. On a CISC architecture, instruction timings are far from constant. Some instructions (have you looked at the x86 instruction set? It even has string manipulation instructions) can take several times longer to execute than others, which makes generating code very difficult. For example, you might know that it takes n instructions for a load to complete if accessing from memory and m if accessing from cache. How many instructions is that? That's much, much easier to work out on RISC. To prevent pipeline stalls, you need to make sure that you have a minimum of m instructions (and ideally m) between your load and your first operation that depends on the that data. Try doing that with a fixed-timing instruction set (RISC), and then with a variable-timing instruction set (CISC), and see which is easier.
I am TheRaven on Soylent News
Dreamcast is one of the easiest game consoles for programmers.
There's lots of languages out there and it's probably not that difficult to invent new ones. But people just don't abandon their current languages the moment a new one comes along. So inventing a new language is probably not the best way to get a lot of applications written and ported to a new processor architecture.
Sounds like the name of a battling robot that looks like an octopus and piles up its opponents.
Yes, there are things like that now, and much better in fact. Even the most basic of popular compilers will optimise code like that pretty well. But, the better way is much more suited to parallel code.
;)
Code is object oriented, so all you need to do is say something like PLAYER1 can do his own thing at his own speed, as long as he's not killing another PLAYER. If he is, then he has do slow down and let the other player sync up with him. Semaphores and similar locking techniques allow threads to wait on other threads when necessary, but to continue at their own pace when possible. So, if there are a thousand computer-controlled players, and only none of them need to be synced with something, they're free to hit the parallel processing SPEs as fast as they can. Only when they interact with something that isn't able to take the information right away will they need to slow down. The Cell isn't quite THAT parallel though, since the SPEs aren't completely independent (no one would expect them to be probably), so it's not going to get the performance you might expect from eight processors zipping through a thousand characters or a thousand effects or whatever, but when the timing and interdependence issues are taken into consideration, the results should be pretty impressive.
Parallel-aware compilers are aware of threads, and have specific constructs to say things like this thread must wait, or it can continue as long as X hasn't happened, or whatever. There are nice high-level APIs for this in C++, python, etc. What IBM's compiler seems to do is take a little more of this on board without specifically being told to. Personally, I think that's a bit hyped, and that developers will still have to mark their thread synchronisation points. But, that's really not such a big deal. Debugging is
The PS2 had a MIPS 5900R as main, and as co two FPUs and two vector units. Isn't the cell just a continuance of that 'application specific' way of thinking. I think that with Blue Gene IBM finaly nailed the network part. That's why the Cell is called 'the Cell Broadband Engine'. Classical scaling killed of moores law a 3-4 years ago, and this and/or more cache on processors is the respons.
;)
- borked msg alert, swede behind the controls
So, all Intel processors are dead then are they?
Holy crap, that means half the processors in the world are a DOA product, whatever the hell that means. Maybe we should all use compilers that produce slower code and ignore special optimised compilers! Yeah that'll help the chip manufacturers ship units!
Please think before you textually masturbate rubbish.
If your "element" "implements Runnable", then "element.run()" in the foreach (Since Java 1.5 this is finally available) should work just as expected, but be careful, you can't be sure when those threads finish, unless you put the main thread on hold until they are. Should still speed your code up nicely, provided the VM used supports multiple processor cores (And this is the crux, but noone is keeping anyone from building one for Cell :-)
Ever heard of profile-guided optimizing compilers? Many runs of a program, in "profiling" mode, that was generated from such a compiler, produces metrics about how the code is actually executed. Next, the the program is recompiled using this information, producing a much more optimized program.
Many C/C++ compilers today support this feature. Including the latest GCC, Intel's C++ compiler, MSVC++ 8.0, and a bunch of others.
I don't know if IBM's compiler for their Cell architecture supports profile guided optimizations, but if they ever want to take full advantage of the architecture, I forsee that they will build it into their compiler.
the simple fact is that C/C++ is the standard for writing native code.
I'd drop another 30-40$ for a cartridge that would process boinc workunits.. especially if it were so well optimized it really contributed to my total...
every day http://en.wikipedia.org/wiki/Special:Random
The problems IBM programmers are having are emblematic of the problems that the PC industry is going to be facing in a few years. Multi-core is the future of PC performance. Increasing GHz and IPC of single processors has pretty much hit a wall. Creating Dual and multi-core CPUs is the best approach we have left for increasing performance with future increases in transistor count/density.
The problem is that single threaded programs will run just as slowly on your quad-core 'Core-Quattro' in 2008, as they did on your old Pentium 4 - c. 2005. Great, yeah, I know, server loads parallelize very nicely (witness the miracle of Niagra), but consumer grade CPUs are where the volume is at, and people are going to have to notice a real difference in performance in order to stay on the hardware upgrade treadmill. This necessitates that Intel/AMD/IBM come up with new programming models that make it easy to parallelize existing code. Parallelized libraries and frameworks are all well and good, but it will be 20 years before everyone gets around to recoding the existing codebade to the the new platform - and most of them are probably not going to generate optimal code.
No, what we need are compilers that take programs written in a serial fashion, and emit code that scales well on multiple processors. The problems with the PS3 are only the beginning.
Very interesting! As I am not a developer, I did not know this feature existed. I should have guessed it, though.
Also, the division into "expert programmer" and "regular programmer" is silly.
No it is not. There are average programmers out there. They can hang pretty good. They can generally grasp what is going on. But they do not do the cutting edge work. They crank average code out for average things.
Now your 'expert' is the sort of guy out there that lives and breaths it. They like to know that instruction xyz takes 3.9 cycles on average to finish. They live on this sort of thing. They are truely spooky to watch. They just zone and whatever they do comes out awsome. These people are extreemly rare. They are usually semi difficult to work with too as they do not think like everyone else.
That's not because people are too stupid to do this sort of thing
Now there I really disagree. There are truely stupid people out there that program. They should not be near a keyboard at all. They are mearly in it for the money, or presteege. They usually do not take the time to learn a system. They whip things out and just let the cards fall where they may. If they bother to comment code you will see things like 'not sure why this work but it does' or 'just incase not sure why' or 'someone else told me to do this'. Then you as an average programmer comes by and goes what the hell were they thinking?!
Also this compiler may just be an enabler to get *AT* the features of a CPU. You do need a compiler capable of exploiting things. Will it have some sort of wizzy thing to auto thread things? Maybe, maybe not... If I didnt have a compiler that could let me at the other 'cells' in the Cell processor I would be rather mad at the people who make the compiler. Wouldnt you?
But like *MOST* programs the best thinking is not done by the compiler but by the grey stuff holding your ears apart. The compiler is mearly the tool to get things done. It is not the thing that does the work for you. 99% of things out there need 0 optimization. It is that 1% where you will need it the most. The most optimal code in existance is the code that never runs. It uses 0 cycles. Real optimization comes in where you time the code then decide what needs to change to make something go faster. If you just 'hope' everything will be optimized by the compiler you will always have slow code. You will also never understand why. You are just 'guessing' the compiler is crap. And like most guesses when dealing with computers, you are wrong.
I highly recommend the book debugging It shows many of the common errors people make when trying to make a system and diagnose one. And it is a fairly easy read to boot.
Hey! Linux is thwell becawth itth pwetty.
I understand that a program can be compiled with optimization flags specific to one hardware platform or another. What I'm confused by is the implicit claim made by IBM that the Octopiler does something more than this. I had always assumed that the Cell could interpret a flat program and divy up processing on the fly. That is what one of the cores is for, no? Apparently not. That's what tripped me up. IBM is proud that it has a proprietary compiler preconfigured for development on its Cell chip. Nothing more.
Only the DoD can afford Real Programmers. Game Companies have to settle for Quiche Eaters. On the bright side, Octopiler does a great job with Pascal code.
RISC is over.
RISC has some good ideas, but a fair number of drawbacks too.
For small systems, uniform-sized instructions don't use memory effectively enough. Because of this ARM is abandoning RISC in favor of THUMB 2.
For families of chips, the idea of exposing the hardware to the compiler turns out not to work because you cannot maintain many assumptions across individual incarnations in a family of processors. For example, look at MIPS. They eschewed interlocked pipeline stages, but had to put them in in their second processor in order to maintain binary compatibility with the first processor.
I don't get your last comment though. We're still getting tons of optimizations at compile time. That part hasn't changed. The article is all about compiler optimizations!
Anyway, yeah, strict RISC is dead. But many of the things we learned from RISC are still being employed.
http://lkml.org/lkml/2005/8/20/95
It's a good thing there's Sony had the good sense to make 8 cores instead of five, otherwise IBM would have given us the PENTOPILER, and I would have boycotted PS3 just for the name.
I thank IBM, on behalf of the illiterate developer community of the world, for using naming conventions that suit the layman. Or the complete dumbfreak.
Octopiler...
The GP was contending that if a fancy compiler is required to achieve good baseline CPU performance (i.e., using all the SPEs on the Cell concurrently), the architecture in question won't be as successful as an architecture that can get good baseline CPU performance without special optimizations.
In modern CPUs, the out-of-order instruction window is what allows independent instructions to execute when their operands are ready, regardless of the schedule the compiler lays down in the binary. Sure, if you put a load and a use of that load right next to each other, the use is going to have to wait. But meanwhile, other instructions from earlier/later in the stream can execute. Dependencies are resolved on the fly via register renaming and memory disambiguation hardware.
On the other hand, Cell needs a compiler to figure out where the dependencies are and aren't so it can schedule code to execute independently on different SPEs. Today's compilers could produce code that would execute well on one SPE, but all the rest of them would sit unused. This sort of "optimization" (I wouldn't even call it that, I would call it program transformation) is difficult to do.
I realize the days of turning high-level languages into a fixed instruction sequence are long gone, but today's CPUs would get within, oh, say 80-85% of their optimal performance if a compiler did do that. The Cell, on the other hand, would see a slowdown of factors of 4 or 5 (vs. using all the SPEs) without a using parallelizing compiler or writing code in a completely different programming paradigm.
But it's going to take m cycles to load the data, no matter what, so RISC chips can't decrease the maximum time, but only increase the minimum time. How does that help us make the program faster?
Ewige Blumenkraft.
I would love to find a replacement for C++, and if there is something that will meet my requirements, please let me know.
First of all, lets ignore the actual existence of the language tools, if there is even anything on the horizon, I'd like to know. In order to replace C++, I need something with that will be as efficient and predictable. I need some way to exploit all features of the machine, for example: If the machine has a reciprocal-square root instruction, I can make an function inline float rsqrt(float) in C++, and use inline asm to emit the instruction. The compiler will then slot that instruction right into the calling code and optimise it extremely well (no branches, no extra loads, etc). I'm not saying that exact method is ideal, but the bottom line is if the machine has a certain reciprocal-square root performance, I'd need to be able to at least approach it. The only reason this works in C++ is because you usually have an assembler that can generate any instruction the target machine can execute, and the C++ compiler gives you a window to it.
Memory management is the other big issue. I need to have more control over memory than just a global garbage collected heap and a stack. The option of using a garbage collector on a block of memory is not unwelcome, but I need to have more control for some things. In C++ I can specify the address of an object, so for example, if the machine has scratchpad memory, I use that memory and still have reasonably maintainable code.
C++ is not perfect by any means, but right now you can develop a game for all of the major platforms using it, and usually with it alone (with inline asm, and for the main processor at least). I await it's successor in this field, but I don't think it's out there just yet.
PS. I would just be happy if we had more modern multipass compilers for C++ that would unify declaration and definition.
I know that was the idea, but it wasn't true for any popular RISC architecture.
In the early days of RISC, integer multiplies typically took 3-5 cycles and divides took 33. Loads and stores of course have variable latencies too.
AMDs 29K architecture turned an integer divides into a 33 instruction sequence to get around this. It also make it impossible to optimize this on later chips in the family, when 17 cycle divides became commonplace (first popularized on Pentium).
With any modern architecture, RISC or CISC, the instruction scheduling restrictions are bestial. Which is another reason why it baffles me that people continue to use gcc as their compiler. It generates awful code.
http://lkml.org/lkml/2005/8/20/95
This is an educated guess at best, but would not programmers in teir IV tend to write code that stands little to nothing to gain by Cell's parallell architecture? I mean, engine programmers would be in tier I - III, the menial tasks would be tier IV. I do not see controller polling and other boring game logic benefiting greatly by using more than one SPE. Matrix and vector math, on the other hand, have potential - but a good optimizing compiler can optimize that stuff without a lot of hints.
If this compiler is truly intended to make unoptimized tier IV code Cell friendly, it had better do an extremely good job simplifying branches -- since the SPEs are highly pipelined and branching can stall for up to 18 cycles. A lot of highlevel programmers never think about the consequences of branch misprediction even though they write some of the most branch-heavy code. Compiler optimization feedback is nice, but I do not think tier IV programmers look for or even know what to do with it.
Actually I think you got it backwards. Software people need to realize that if we want to keep our jobs then it is in our best interests to become good programmers on the latest hardware. Our current computing paradign (hate that word) is coming to an end people. Multi core processing is here and the CELL is just looking at it from a different perspective. We will ALL be writing parallell code or we will not be writing code. I for one am quite excited about the return of optimized programming.
what?
Similarly, a good Linux port will share processes over the 8 cores optimally - is Linux for Cell available yet? Benchmarks? I'm keen to see the Cell blade servers coming soon!
Note that Sun Studio compilers were freely available before their new T1-powered servers were launched.
Without the right toolsets, hot tech is not so cool. Let's hope Cell and T1 are not burried in the Alpha/Itanium graveyard!
Zen tips: Pay attention. Don't take it personally. Believe nothing.
Sounds like a hentai anime moster!
Nice idea you have there! How about some more readable higher-level language whose compiler front-ends produce Fortran structures for the existing Fortran optimizers? Such a language could easily have the familiar C look&feel, etc...
IBM Mainframes have had this forever!
P2P Anonymous Distributed Web Search: http://www.yacy.net/
Interesting point on the intermediate code front. I think the hardware front ends in the chips that convert instructions into internal representations work very well and they are so small compared to the rest of the chip (even next to the cache) that saving that many transistors just doesn't amount to much.
Additionally, again on the embedded front, if you translate intermediate code, you may not have a place to put it. For example, Xbox 360 or PS3 can have tens or hundreds of megabytes of code on a ROM but no equivalent amount of R/W storage to put the translated code in.
That's a special case of course.
http://lkml.org/lkml/2005/8/20/95
Pretty much all modern CPUs need special compilers to give good performance.
If pretty much all modern CPUs need a particular feature, it's not "special" anymore. What's "special" about Cell is the features that make it different from mainstream CPUs.
To prevent pipeline stalls, you need to make sure that you have a minimum of m instructions (and ideally m) between your load and your first operation that depends on the that data.
So, in addition to having to wait for the completion of the variable time operation, which the data-dependent operation has to do anyway, the equivalen RISC instruction sequence is going to fill up the pipeline and the cache with junk. But, hey, at least the pipeline didn't stall--the CPU designers have successfully pushed the problem off their plate. Thanks for illustrating my point.
In any case, I'm not defending complex instructions at any cost, I'm saying that CPU designers have gone too far in pushing problems off onto compilers. We may not need a string edit instruction, but we do need better support for various forms of parallelism than Cell or Itanium. I expect the evolution of MMX, hyperthreading, and multicore chips is going to be much more important than architectures like Cell or Itanium.
About ten years ago VM Labs came out with something not too far off conceptually from the Cell - vector instructions, local memory you had to DMA in and out of, 4 processors on a chip. It wasn't floating point, however, and the development tools were best described as rudimentary: the best way of debugging was to deliberately crash the box and examine the register dump barfed back over TCP/IP.
They called a developer's conference in August 1998, where after the presentation a veteran game coder shrugged: "Another weird British assembler programming cult".
The Cell strikes me the same way, and for the same reasons, although Big Blue likely has more development tool budget than VM ever did. Not to take anything away from the smart guys at IBM, but I suspect they'll have a fun time working around the Cell's limitations. I can tell them from experience that DMAed local memory will be much more of a pain in the ass than they can imagine, and unless they can guarantee sync in hardware they'll be wasting a bunch of time schlepping spinlocks in and out of memory. The vector stuff will also be nontrivial: the best way to make that usable, apart from having everyone write vector code from the git-go, would be to provide a stonking great math library in the style of the Intel Integrated Performance Primitives.
As an aside, the PS3 is in the tradition of Sony not caring about who programs their machine: the PS1 was easier to code than the Saturn, which was a true horror, the PS2 upped the difficulty a fair bit, and now even experienced coders are bitching about the PS3. Meanwhile Microsoft is learning from their mistakes: the X360 is easier than the X1, and if you doubt that makes a difference, check out game development budgets and time to delivery. I don't care, really: I eat algorithms and machine code for breakfast, so this just means more jobs and money for me.
This architecture has been tried before, for supercomputers. Mostly unsuccessful supercomputers you've never heard of, such as the nCube and the BBN Butterfly. There's no hardware problem building such machines; in fact, it's much easier than building an efficient shared-memory machine with properly interlocked caches. But these beasts are tough to program. The last time around, everybody gave up, mainly because more vanilla hardware came along and it wasn't worth dealing with wierd architectures.
The approach works fine if you're doing something that looks like "streaming", such as multi-stream MPEG compression or cell phone processing. If you want to do eight unrelated things on eight processors, you're good.
But applying eight such processors to the same problem is tough. You've got to somehow break the problem into sections which can be pumped into the little CPUs in chunks that don't require access to any data in main memory. The chunks can't be bigger than 50-100K or so, because you have to double buffer (to overlap the transfers to and from main memory with computation) and you have to fit all the code to process the chunk into the same 256K. That's a program architecture problem; the compiler can't help you much there. Your whole program has to be architected around this limitation. That's the not-fun part.
You have to make sure that you do enough work on each chunk to justify pumping it in and out of the Cell processor. It's like cluster programming, although the I/O overhead is much less.
In some ways, C and C++ are ill-suited to this kind of architecture. There's a basic assumption in C and C++ that all memory is equally accessable, that the way to pass data around is by passing a pointer or reference to it, and that data can be linked to other data. None of that works well on the Cell. You need a language that encourages copying, rather than linking. Although it's not general-purpose, OpenGL shader language is such a language, with "in" and "out" parameters, no pointers, and no interaction between shader programs.
Note that the Cell processors don't do the rendering in the PS3. Sony gave up on that idea and added a conventional NVidia graphics chip. (This guaranteed that the early games would work, even if they didn't do much with the Cell engines.) Since the cell processors didn't have useful access to the frame buffer, that was essential. So, unlike the PS2, the processors with the new architecture aren't doing the rendering.
It's possible to work around all these problems, but development cost, time, and risk all go up. If somebody builds a low-priced 8-core shared memory multiprocessor, the Cell guys are toast. The Cell approach is something you do because you have to, not because you want to.
and virtually all useful programming languages have global side-effects.
Haskell being the exception that could break the logjam, right?
If Cell can't deliver top-notch performance with a simple compiler back-end and regular programmers who know how to write decent imperative code, then Cell is going to lose.
What's so good about imperative code? What's so bad about purely functional languages such as Haskell?
Which is another reason why it baffles me that people continue to use gcc as their compiler. It generates awful code.
What C compiler generates better code and can be distributed with a Free operating system?
In C++,
I'm not sure what you mean about "extreme verbosity"... Sounds like you're a Perl programmer (;->
Also static introspection is possible in C++ using overloading and/or templates... [But it would have been more convenient if it was an explicitly-built-in feature] Also, templates can be used in very powerful ways (template metaprogramming), for example to automatically choose the best sort routine (at compile-time) for a particular data-type. It is even possible to use templates to create highly portable code (e.g. when char,short,int,long vary across architectures and a minimum level of precision is required).
In terms of writing complex, high-performance software, I don't see anything replacing C++ (not even Java). But for applications where performance is not an issue, I find the strong typing features of C++ to be an advantage...
Think of how many web CGI scripts have security flaws because they are passing un-sanitized data from the GET/POST data to SQL queries or the command-line? Well, these flaws could have been prevented at COMPILE TIME, with a strongly typed language, such as C++. (Strings in different domains could have different types, forcing the programmer to run specialized functions for sanitizing one string before using it in a different domain)
My little knowledge of functional-based languages is that they tend to copy data unnecessarily. This doesn't matter in many applications, but it becomes a show-stopper in terms of performance for some.
The number of threads it is optimal to create is roughly the number of processing units at your disposal - something you don't actually know until runtime
O rly? I thought that by definition of the Cell processor, you got one PowerPC and 7 DSPs on every PS3.
But if Sony comes up with the right dev tools and market share, it won't matter. Sega's problem was at least as much poor market share making it less profitable to write for their platform as much as it was being tough to program for.
And in Sony's defense, it isn't like 360 is easy to write for either, you have to write multi-threaded games, which isn't quite the norm.
It's a calculated risk Sony is taking. They had to take some kind of chance, you're not going to deliver the kind of performance it takes to match up to 360 by sticking with conventional design.
http://lkml.org/lkml/2005/8/20/95
Apple uses gcc. Their OS isn't free.
Sony uses gcc for PS2 and currently uses gcc for PS3, and nothing about their platform is free. And their dev kit is FAR from free.
I do agree, gcc is a good value for the money. But if you have a choice, you can get a lot better on performance than gcc. In many systems, it's worth the cost.
http://lkml.org/lkml/2005/8/20/95
In RAM means you spend extra time recompiling every time. That makes no sense. That's not the same as using an intermediate format as a distribution format. You're essentially talking about using a dynamic recompiler. And they just wouldn't match up on performance to a native compiler due to having to recompile it each time you load the code.
As to your "Nope", a game can have 10MB of code without much difficulty. Yes, almost all the data on the disc is assets, but given that the disc can have 9.0GB, having 0.01GB worth of code would still mean almost all the data on the disc is assets, yet it still has 10MB of code. Even at 100MB of code it'd still be 99% assets. One hundred MB might be a stretch, but it is quite possible, given the ROM size and your system can't break down when someone uses the system in a way you didn't expect.
In addition, I said "Xbox 360 or PS3", and you responded with a comment about the PS2. Weak.
http://lkml.org/lkml/2005/8/20/95
Transmeta was contracted to write some of the tools for programing on the cell. So it wouldnt surprise me if the compiler mainly came from Transmeta. I beleive that part of the reason they got the nod for this was their previous experience developing their proprietary "Code Morphing Software" layer. So perhaps this compiler isnt as horrid as everyone seems to believe.
Septapiler for the PS3?
0- Eamonman Proud member of DNRC
Yes Itanium has failed to grab anything like the market share it was meant to. But that has nothing to do with its architecture. There's an arstechnia review from last year (I think) which talked about the Itanium architecture, and they were very up beat and complementary about it. The summary of that article was that as fabrication tech improves and die shrinks follow, and it becomes possible to cram more cores and larger and larger caches on to a chip, the Itanium architecture has more scope to grow and perform than any of its current competition. EPIC loves large caches.
There is only one real reason why Itanium has been such a flop so far, and that's x86-64. Intel had no intention of bolting 64 bit tech onto the x86 architecture. If you wanted 64 bit computing you were meant to go Itanium. End of story. That was the way Itanium was going to get its market share, and large volumes were going to drive the costs down. Intel either didn't see AMD coming, or didn't see what they were doing as a threat until it was too late. The x86-64 bomb shell, when it hit, threw Intel into complete disarray. Not only was x86-64 way cheaper than Itanium, but it out performed it and it offered seamless backward compatibility. The Itanium volume market plan was doomed from that moment on. As a consequence Intel had to scrap their x86 road map and re-draw it with their own 64 bit implementation, i.e. EM64T. They've been playing catch up ever since.
A side effect of the Intel's change in direction and focus has been a change in where they've put their resources. Itanium got starved of the resources it was originally planned to have and as a consequence Montecito is way late and isn't quite the kick ass design it was meant to be. Intel's partners like HP have suffered as a consequence.
Never the less Itanium is not going away, and even though Montecito is late, the current crop of Itanium chips are no slouch. When Montecito arrives it's going to give a much needed boost to HP Itanium sales. That's what they hope for anyway.
If I were to program games on Cell, I'd rather not use a dumbed down all-in-one compiler. This is the kind of an easy solution for a complex problem that's never going to work well. And programming a heterogeneous, asynchronous, memory-asymmetric architecture is complex.
Let there be an SPE compiler that produces "tasklets": bits of SPE code plus some positioning information, such as location of DMA areas. The compiler may be for some specialized vector-friendly language to match the units' instruction set well. Then, make a library for the main CPU to facilitate deployment of SPE tasklets, handle synchronization, DMA area management, dynamic unit allocation and so on. You'll be amazed how many programmers turn out ready to work in this model.
If someone just wants to port existing code, well, there is a whole POWER core there, AltiVec and all! Is it too weak?
My exception safety is -fno-exceptions.
Although their 8 cells on each PS3 processor, one is redundant (to improve yields) and one is assigned to the OS. So you 'only' have to handle six different cells.
Based on the blow quote ... unless I am missing something 22% performance improvement hardly seems worth waiting for ... am I miss-reading something:
"We first evaluate the optimized SPE code generation techniques presented in the section "Optimized SPE code generation." Figure 11 presents the reduction in program-execution time for each optimization relative to the performance of the original compiler. We achieved a reduction which ranged from 11 to 51 percent, averaging at 22 percent."
This maybe true for the desktop market, but for embedded the special compiler IS needed, and also not everyone can write a program that will run ok on a mobile phone. The programmers must take into account the architecture specifities so that the code executes acceptably in the constrained environment.
And the general trend is to take into account more and more variables (like power consumpption and different execution units that can perform different tasks).
The future of Cell depends on IBM on the developers and on the users. By itself needing a complex compiler is not a reason to fail.
There are plenty of languages that do much better with parallelism. Erlang, for example. Maybe you want static typing for speed, but the point is, if you use message-passing or dataflow concurrency, instead of the usual resource locking, you make parallel processing a lot easier on programmers, and don't have to build a heroic compiler that somehow figures out parallelism from sequential code.
Tim Sweeney presented a paper recently on the topic of game engine design for multi core systems. Basically it amounts to changing from c/c++ to a functional language like Haskell for engine development so the language takes care of the task divisions needed. Code complexity is already a problem for engine development, managing threads is just gonna make that worse. When engine development time is as long as it currently is, taking a performance hit in code execution can be offset by a faster time to market.
All this clever is design going into getting the compiler working well, but surely the cleverest parallel code could be ruined by having to share the processor(s) with other tasks/processes/threads.
I wonder what the cost of being pre-empted on one or more processors is. A really clever design might allow the programmer to place hints such as "don't preempt this chunk, it's optimised really nicely" or "dedicate a processor to this code until I tell you otherwise".
Note to ACs: I won't mod you up, even if you are being funny or insightful. So take a chance! It's not real life!
There's a major problem with your argument.
"Extra few percent"
I hear that all the time.
"Instead of compiling with optimizations, I should be able to distribute a debug build and just make them buy better hardware".
We're talking at least a 25% increase in speed in many CPU-bound applications, and often a several-fold increase with specialized compilers.
People who do video and audio encoding are not the target here either -- in those cases, optimized builds often make little difference because somebody went to the effort of hand-writing the main loops in optimized assembly already.
Check out Intel's C compiler versus Microsoft's (or GCC) for simple non-CPU-bound application performance differences.
- Michael T. Babcock (Yes, I blog)
And perhaps you're even on the inside, since you know Sony tried to go without the NVidia chip.
I was around for last range of machines, HyperCubes and such that you speak of, and you're dead right as to why they were dumped. Most tasks couldn't be divided up well enough to use the hardware effectively.
I would add a little bit. First, using 8 (7) processors will be a lot easier than using the hundreds in those older machines. Second, given currently technology limitations, it isn't likely someone is going to match the potential Cell performance with a shared-memory design at the price Sony pays for Cell.
Additionally, if you read the article, IBM has a proposed compiler-based solution to the necessity of using 256KB pages. I have to say I'm more than a bit skeptical about this.
http://lkml.org/lkml/2005/8/20/95
Slashdot: Where anecdotes and generalizations can be freely substituted for facts, logic, or intelligence
Xbox 1 you could port games to and from it with ease. Using conventional programming.
360 requires multiple threads to use it well. Additionally, you have to do GPU programming (shader programming) to use it well. Those are huge increases in complexity from Xbox 1, which was quite straightforward to program for.
The 360 still has the unified memory architecture at least.
http://lkml.org/lkml/2005/8/20/95
The Cell fundamentally requires program transformations to be performed by a compiler to make use of most of the chip. The only other CPU that comes somewhat close to that is Itanium.
Now, we can debate just how much performance loss is seen with unoptimized code on dynamically-scheduled out-of-order superscalars, and you have a point there: it can be significant. But not as significant as only using 1/8th (or 1/nth, where n is the number of processing elements) of a chip.
It's a well known fact that the Sega Saturn was a powerful machine due to it many processors. If I remember correctly, it had 2 processors used for calculations, a sound processor, and maybe three more used in various graphical function (the assigned specific functions to each other. Their reasoning was that by using a much more parallel architechture they could create a much more powerful machine at a lower cost.
It backfired when they didn't make it easy to code for such a beast and also didn't provide good support documentation to developers (if I remember what I read). The machine ultimately died to due this that caused a lack of good games to be released.
Can you say OpenMP? (www.openmp.org)
The emotion engine had 3 parts to it... Programming for it was in no way normal but it looks like the console dev's pulled it off.
Hmmm... Pie...
I for one welcome our Octopiler overlords
Mystika
In RAM means you spend extra time recompiling every time. That makes no sense.
What else is the CPU doing while the optical drive is loading things from disc?
And they just wouldn't match up on performance to a native compiler due to having to recompile it each time you load the code.
A lot of programs for video game consoles only load code once, namely at the beginning. Couldn't they do this recompilation while displaying the allegedly legally required unskippable copyright notices and unskippable logos for the publisher, developer, and licensor of the asset franchise (e.g. a movie studio or a sport league) and while loading the asset data for the title screen and main menu?
One hundred MB [of code that must be recompiled] might be a stretch
How many source lines of code compiles to 100,000,000 bytes of object code, how long would it take for human beings to develop and test that much code, and how would such an enormous project be funded? Does Microsoft Windows even contain that much code?
In addition, I said "Xbox 360 or PS3", and you responded with a comment about the PS2. Weak.
The generalization from PS2 to the next generation was left as an exercise for the reader. Xbox 360 media (DVD-9) is not larger than PS2 media (also DVD-9), and just as a lot of PS2 games came on CD-ROM, I expect a lot of PS3 games to come on DVD-9 media due to the higher initial replication cost of BD-ROM.
Like in 1's and 0's and XOR's? You mean complimentary. Most people garble the other usage, e.g. "A good compiler is an essential complement to advanced hardware".
Also "upbeat" is one word.
Without your e-mail address, Ms. Edna Krabappel can only correct in public.
=S
I'm afraid that's not true. C++ does provide crutches for C, and can solve a few problems indeed, but in my experience, in the long run, it just fails.
* Macros largely unnecessary with the use of templates
True, in theory. Macros stay largely used in practice.
* C++ is strongly typed (much more than C)
I'm afraid you're confusing "static typing" and "strong typing". C++ is a mix of static weak typing, tiny bits of dynamic typing and big chunks of no typing at all. In other words,
* Character strings unnecessary (object-oriented replacements more generic, more powerful)
:)
Indeed. However, Mozilla has 12 different classes of character strings, Apache Xerces 2 or 3, wxWidgets 4 or 5, and none of them are compatible across project or with the STL. Which was exactly my point.
* GC? Well, it can be done a bit more nicely with some object-oriented techniques and programmer dicipline)
True. Unfortunately, that's quite hard to do, and it grows nigh-impossible when you're trying to mix two different libraries. And, well, come on, unless I'm writing a critical or semi-critical app, I don't want to spend half of my time, if not more, managing memory. If I'm writing a critical or semi-critical app, I won't use C++ in the first place. I'll probably go Ada or Esterel, depending on the task.
* Could be extremely portable, but most people seem to not bother to make their programs that way.
In my experience, it's doomed at low-level, starting with the low-level libraries you will probably need to write your program. Making it portable actually requires fighting against the conventions of these libraries.
I'm not sure what you mean about "extreme verbosity"... Sounds like you're a Perl programmer (;->
I'm a functional programmer. Walking through a tree to collect informations should be something I can write in 5 short lines, including type information. Not in 100 lines (assuming a good library), to obtain unsafe, MT-unsafe and harder to read results.
Also static introspection is possible in C++ using overloading and/or templates... [...] :) For expert systems, I'll use Prolog. For databases, well, either SQL or some embedded SQL (what's the name ? link ?). Etc. In any case, I will choose a language with actual strong typing rather than a false sense of security.
True. But comparing this to Lisp-style or MetaOCaml-style static introspection makes me want to weep.
In terms of writing complex, high-performance software, I don't see anything replacing C++ (not even Java). But for applications where performance is not an issue, I find the strong typing features of C++ to be an advantage...
I believe it depends on the domain. Given complete choice, for most applications which do not require direct access to the hardware, I would probably use OCaml (if performances matter most) or Haskell (if readability matters most) and now maybe F# (for portability). For distributed applications, I would probably use Mozart. Or my own upcoming language
Think of how many web CGI scripts have security flaws because they are passing un-sanitized data from the GET/POST data to SQL queries or the command-line? Well, these flaws could have been prevented at COMPILE TIME, with a strongly typed language, such as C++. (Strings in different domains could have different types, forcing the programmer to run specialized functions for sanitizing one string before using it in a different domain)
I agree
This troll is over. You can now resume a normal activity.
What else is the CPU doing while loading things up from disc? Hopefully running the game. Let's not get trapped in the "please wait, loading" metaphor here. Jak 2 showed years ago that you can load while the game is running. So my assumption is that while the DVD is loading stuff, the game is still runnning and the CPU is not idle.
About your comments about only loading once at the beginning, well, I don't really believe that. Yeah, I can see it with some games, but not a lot of them. They'll load some specialized code level-by-level. I do hear you that you could do some more work in there, but personally, if the developer is going to take extra effort to figure out how to get some work done behind those time-wasting screens that it be real work that will save time later instead of make-work that just gets us back to no time deficit.
I do agree 100MB is a stretch. Windows does perhaps have that much code, but it isn't all loaded at once. If you count all the apps and all the different drivers, it might come out to that much code. But again, you don't load every driver and every app at once. I don't expect games to have that much code.
"A lot" of PS2 games didn't come on CD-ROM. All of them did for a year, but after that, it died out quickly. And given that the number of games for the platform in the first year was about the same as the number of games it gets in a month now, I wouldn't say "a lot" of PS2 games came on CD-ROM.
I do agree on PS3, I think all games will come on DVD-ROM for at least a year. I think that games will stay on DVD-ROM for PS3 longer than they stayed on CD-ROM for PS2 simply because many PS3 games will be 360 games also, and 360 is DVD-only. PS2's only contemporary at launch was Dreamcast, and it already had more storage than a CD-ROM, so developers could "break free" a little. And most of them ignored Dreamcast anyway, giving another reason they could break free.
Anyway, I still find your arguments uncompelling. Consoles have found little reason to change processors or such in the middle of a product cycle, woudln't they rather keep with the older one which certainly costs less anyway? A least until the new generation of machines comes out? Perhaps it would be good for long-term backward compatibility. But then again perhaps MIPS code is as good an intermediate distribution format as an arbitrary bytecode anyway.
http://lkml.org/lkml/2005/8/20/95
A very nice pdf article which shows why high precision is needed - for precision AND repeatability: Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications.