Larrabee ISA Revealed
David Greene writes "Intel has released information on Larrabee's ISA. Far more than an instruction set for graphics, Larrabee's ISA provides x86 users with a vector architecture reminiscent of the top supercomputers of the late 1990s and early 2000s. '... Intel has also been applying additional transistors in a different way — by adding more cores. This approach has the great advantage that, given software that can parallelize across many such cores, performance can scale nearly linearly as more and more cores get packed onto chips in the future. Larrabee takes this approach to its logical conclusion, with lots of power-efficient in-order cores clocked at the power/performance sweet spot. Furthermore, these cores are optimized for running not single-threaded scalar code, but rather multiple threads of streaming vector code, with both the threads and the vector units further extending the benefits of parallelization.' Things are going to get interesting."
Bet they've got some serious CONTROL structures to keep things from getting too KAOTIC....
"Would you believe a GOTO statement and a couple of flags?"
The story title conjured up images of the boxes of ISA cards I've still got sitting around. Ah, the joys of setting IRQs... good times.
512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
Bet they've got some serious CONTROL structures to keep things from getting too KAOTIC.... "Would you believe a GOTO statement and a couple of flags?"
How about a while loop and a continue statement?
http://michaelsmith.id.au
It appears that this could well improve the speed of lots of different operations. A definite boon for graphics like operations, but also a lot of DSP (audio/maths)stuff can benefit from these enhancements. It would also appear that general code could easily be sped up, however, compiler writers need to get their collective arses into gear for this to happen.
However, give the average developer more speed, and all that gets produced is more bloat with less speed. If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor! (smart code that actually executes quickly is generally too difficult for the dumb-arsed upper level (management) programmers to understand, and is thus removed. Believe me, I've seen this happen many times!)
That's what libraries, toolsets and custom compilers are for. If the problem was just silicon we'd have Larrabee by now. What's holding up the train is the software toolchain and software licensing issues.
Don't worry, though. On launch day the tools will be mature enough to use, and game vendors will have new ray tracing games that look fabulous on nothing but this.
I'm hoping the tools will be open but that's a long bet. If they are, Microsoft is done as the game platform for the serious gamer and Intel will make billions as they take the entire graphics market. Intel will make hundreds of millions regardless and a bird in the hand is worth two in the bush, so they might partner in a way that limits their upside to limit their downside risk. That would be the safe play. We'll see if they still have the appetite for risk that used to be their signature. I'm hoping they still dare enough to reach for the brass ring.
Help stamp out iliturcy.
This 300 watts monter, 8086/386/586/x86-64/mmx+sse+ss2+ss3+whateversse compatible mess represents (or should represent) the end of an era. Few people is asking for that kind of product; price and size is more important. It's just Intel trying to hold the market captive forever.
As a structural engineering in training who is starting to cut his teeth in writing structural analysis software, these are truly interesting times in the personal computer world. Technologies like CUDA, OpenCL and maybe also Larrabee are making it possible to simply place in any engineer's desk a system capable of analysing complex structures practically instantaneously. Moreover, it will also push the boundaries of that sort of software beyond, making it possible to, for example, modeling composite materials such as reinforced concrete through the plastic limit, a task that involves simulating random cracks through a structure in order to get the value of the lowest supported load and that, with today's personal computers, takes hours just to run the test on a simple simply supported, single span beam.
So, to put this in perspective, this sort of technology will end up making it possible for construction projects to be both cheaper, safer and take less time to finish, all in exchange of a couple hundred dollars on hardware that a while back was intended for playing games. Good times.
Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
Your post can be summarized as: Intel Giveth; Microsoft taketh away. That's been the formula for far too long.
And that period is almost over.
Help stamp out iliturcy.
If Intel are smart they will release a chip containing one core (or 2 cores) from some kind of lower-power Core design and a pile of Larabee cores on the one die along with a memory controler and some circuits to produce the actual video output to feed to the LCD controler, DVI/HDMI encoder, TV encoder or whatever. Then do a second chip containing a WiFi chip, audio, SATA and USB (and whatever else one needs in a chipset). Would make the PERFECT 2-chip solution for netbooks if combined with a good OpenGL stack running on the Larabee cores (which Intel are talking about already).
Such a 2-chip solution would also work for things like media set top boxes and PVRs (if combined with a Larabee solution for encoding and decoding MPEG video). PVRs would just need 1 or 2 of whatever is being used in the current crop of digital set top boxes to decode the video.
As for the comment that people will need to understand how to best program Larabee to get the most out of it, most of the time they will just be using a stack provided by Intel (e.g. an OpenGL stack or a MPEG decoding stack). Plus, its highly likely that compilers will start supporting Larabee (Intel's own compiler for one if nothing else).
I was thinking that. The Larrabees vector unit looks like it could just replace SSE entirely.
Which does raise a question - will Intel keep SSE if it adds in the Larrabee vector unit as yet another legacy feature? I'm guessing it will (sigh).
The programming languages that will benefit from Larrabee though will not be C/C++. It will be Fortran and the purely functional programming languages. Unless C/C++ has some extensions to deal with the pointer aliasing issue, that is.
Intel has a lot of smart people in their compilers group, and they've done stuff like this before in different times in the past. I wouldn't at all be surprised if they released compiler extensions to allow quick loading of data into the processing vectors.
I think the point is that in the long run you will have one Larrabee like chip in a desktop that does both the CPU and GPU functions. And in a server that same chip could manage a huge thread pool, which is the best way to do server applications IMO.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
I don't think we will see this in notebooks for a while. We need to wait and see what the real product looks like (Intel hasn't released any specs), but Google for Larrabee and 300W and you will see the scuttlebut is that this chip will draw very large amounts of power.
Isn't this exactly what Gallium3d + LLVM GLSL compiler is giving you? Heck, even with the simple shader ISA's you probably want an optimizing compiler anyway in order to get good GLSL performance, no?
Wouldn't this actually be a good thing; instead of spending all the time developing new drivers for each generation of hw (changing every 6 months, poorly if at all documented), you could just keep on developing the architecture and improve the x86 backend.
Will LLVM help with this? AFAIR, Gallium3D already supports rendering using Cell PPUs and Larrabee is going to look like them.
PS: thanks for your work on Gallium3D!
Yeah, most x86_64 ABI's use SSE for scalar floating point, so it's too late to remove it. But hey, at least SSE is an improvement over x87.
Im skeptical about the future of SIMD and even instruction level parallelism in general for massively parallel processors. The problem with this is that in order to get maximum utiliasation of all of the ALUs in the processor, you have to fill the entire vector with data that you can perform the SAME operation on. This means its up to the programmer or compiler to write highly vectorizable code. If you cant fill these huge 512-bit vectors, arithmetic units are going to be idle. nvidia realised this years ago, and so since the G80 their architectures have been scalar. Without vectors you can run alot more scalar threads while keeping ALL the units busy all the time. Win Win. I'll need some serious convincing if I'm to believe Intel is a real threat to nvidia in this space, especially for GPGPU.
There are lots of instructions and other craft inside 80x86 processors that occupy silicon that is never used. A clean break from 80x86 is needed. Legacy 80x86 code can run perfectly in emulation (and need not be slow, using JIT techniques).
All the legacy junk takes up a pretty small fraction of the area. IIRC on a modern x86 CPU like Core2 or AMD Opteron, it's somewhere around 5%. Most of the core is functional units, register files, and OoO logic. For a simple in-order core like Larrabee the x86 penalty might be somewhat bigger, but OTOH Larrabee has a monster vector unit taking up space as well.
What I like most about Larrabee is the scatter-gather operations. One major problem in vectorized architectures is how to load the vectors with data coming from multiple sources. the Larrabee ISA solves this neatly by allowing vectors to be loaded from different sources in hardware and in parallel, thus making loading/storing vectors a very fast operation.
Yes, I agree. Scatter/gather is one of the main reason why vector supercomputers do very well on some applications. E.g. scatter/gather allows sparse matrix operations to be vectorized, and allows the CPU to keep a massive number of memory operations in flight at the same time, whereas sparse matrix ops tend to spend their time waiting on memory latency when you have just the usual scalar memory ops.
The programming languages that will benefit from Larrabee though will not be C/C++. It will be Fortran and the purely functional programming languages. Unless C/C++ has some extensions to deal with the pointer aliasing issue, that is.
There is the "restrict" keyword in C99 precisely for this reason. It's not in C++ but most compilers support it in one way or another (__restrict, #pragma noalias or whatever). That being said, I'd imagine something like OpenCL would be a more suitable language for programming Larrabee than either C, C++ or Fortran. Functional lnaguages are promising for this as you say, of course, but it remains to be seen if they manage to break out of their academic ivory towers this time around.
The claim that this is the first time you can get "GPU class rendering in software"... with nothing more than a pixel sampler to help is somewhat dubious. Modern GPUs are, after all a bunch of stream processors with a pixel sampler. So, really, modern GPU graphics is all in software except the sampling.
Oh, hey and anyone here remember the voodoo? That was a big (for the sime) sampler driven by an x86 CPU. Sound familiar?
Sarcasm aside, I want one. The peak performance is high, and the programming model is well known. Also, Linux support is likely to be excellent.
SJW n. One who posts facts.
Perhaps. As it stands, though, I don't think Larrabee can run all standard x86 code, since it doesn't support legacy instructions. Plus, even if it did, the performance would suck. For desktop use, it probably makes more sense to have some real x86 cores and a bunch of simpler graphics cores that don't have to be x86. To get full benefit from Larrabee, the code has to be threaded anyhow, so there's not so much point being able to run it on the same core as the standard x86 code.
Fast code that doesn't work is not all that useful.
Except in search engines.
The programming languages that will benefit from Larrabee though will not be C/C++.
Awwwww :-(
It will be Fortran and the purely functional programming languages. Unless C/C++ has some extensions to deal with the pointer aliasing issue, that is.
Oh. You mean like restrict which has been in the C standard for 10 years?
GCC supports it for C++ too. I'd be suprised if ICC and VS didn't support it for C++ too.
SJW n. One who posts facts.
Articles states that there's hardware support for transcendental functions, but the list of instructions doesn't include any. Anyone know what is/isn't supported in this line?
It appears that this could well improve the speed of lots of different operations. A definite boon for graphics like operations, but also a lot of DSP (audio/maths)stuff can benefit from these enhancements. It would also appear that general code could easily be sped up, however, compiler writers need to get their collective arses into gear for this to happen.
Yeah, and while they are at it, I hope they finally get around to fixing that damn segfault bug. It's been around for YEARS.
"Would you believe a GOTO statement and a couple of flags?"
How about a while loop and a continue statement?
In C, a continue breaks out of only one nested while or for loop. If you're in a triply nested loop, for example, you can't specify "break break continue" to break out of two nested loops and go to the next iteration of the outer loop. You have to break your loop up into multiple functions and eat a possible performance hit from calling a function in a loop. So if your profiler tells you the occasional goto is faster than a function call in a loop, there's still a place for a well-documented goto.
C++ code can use exceptions to break out of a loop. But statically linking libsupc++'s exception support bloats your binary by roughly 64 KiB (tested on MinGW for x86 ISA and devkitARM for Thumb ISA). This can be a pain if your executable must load entirely into a tiny RAM dedicated to a core, as seen in the proverbial elevator controller, in multiplayer clients on the Game Boy Advance system (which run without a Game Pak present so they must fit into the 256 KiB RAM), or even in the Cell architecture (which gives 128 KiB to each DSP core).
When performing limit analysis, the lowest supported load calculated through the plastic limit (see limit analysis' upper bound theorem) is the lowest possible load that causes the structure to collapse.
I think Anonymous Coward was trying to say that the layman's term for this load amount is the "highest supported load" that doesn't cause collapse.
Seriously, can someone tell my mhy my post was a troll? The GP was rererring to the lack of a feature that C has had for 10 years. The C99 standard came out in 1999 and had the restrict keyword in it. This allows for optimizations on a par with FORTRAN since it provides the same guarantees.
I know it's fashionable to hate C++ and C overe here these days. Perhaps that's the problem.
SJW n. One who posts facts.
There are lots of instructions and other craft inside 80x86 processors that occupy silicon that is never used.
Rarely used instructions are not need to be optimized - then they would take very little of transistors to implement. Only heavily used instructions needs to be optimized.
The programming languages that will benefit from Larrabee though will not be C/C++. It will be Fortran and the purely functional programming languages.
I hope you do understand that Fortran was fast because programs written in it were also simple. Modern programs combine lots and lots of math, memory and I/O operations. You can't easily parallelize that. Even now it can be already perfectly parallelized in C/C++, yet resulting software is quite complicated to manage and maintain.
It sounds stupid, but Sun actually already optimized their Java's JIT for SPART T1/T2 which are highly multithreaded CPUs.
All hope abandon ye who enter here.
I think you're preaching to the choir here on killing x86. The x86 ops get translated to RISC ops anyway. What I wonder is why they haven't attempted to release two versions: an x86 version, and a stripped down RISC version without the x86 decoder. Obviously this would be monumental task at all levels of the design, but it would seem they could get similar performance on the RISC version without as much effort as needed for the x86 version since that overhead is removed. I would guess(and hope) that most of their design effort goes into optimizing the design in the RISC world after the instructions are translated anyway. This will never happen though because windows == x86 only. Being able to compile most of the needed applications from source gives hardware designers the freedom to shed legacy interfaces every 5 years instead of every 30. It would be a glorious future if hardware producers started realizing that open source software == greater hardware design flexibility == better performance/cost. Hopefully this is already happening with the shift from x86 to ARM on netbooks.
If developers are too stupid to code for it, it won't go anywhere. This is sounding a lot like the PS3 architecture in complexity.
There are several problems with PS3 programming that don't apply to Larrabee:
* Non-uniform core architectures. Cell processors have two different instruction architectures depending on which core your code is intended to run on. This causes quite a bit of confusion and makes the tools for development a lot more complex.
* Non-uniform memory access. Most cell processor cores have local memory, and global memory accesses must be transferred to/from this local memory via DMA. Larrabee cores have direct access to main memory via a shared L2 cache.
* Memory size constrains. Most cell processor cores only have direct access to 256K of memory, so programs running on them have to be very tightly coded and don't have much spare space for scratch usage.
Any application that's reasonably parallelisable is going to be pretty easy to optimize for larrabee. Most graphics algorithms fit into this category.
Seriously, most of the Mesa shader assemblers deal with very limited, simple, straightforward shader ISAs. This is icky. We're gonna need a full-on compiler for this
If you don't need the extra complexity of an x86 core, you can ignore it. Compilers for this system will be just as simple as compilers for current nvidia/ati designs.
I'm not convinced. It seems like Intel have just bolted a decent ISA for graphics work onto the side of x86. In typical graphics applications, I'm guessing the new graphics instructions will do most of the work - particularly for shaders. They seem fairly decent and sane, quite similar to other modern designs (though probably more flexible).
Now if you want real fun, try getting good performance out of r600 and up Radeon cards. Nasty VLIW architecture with all sorts of strange and interesting restrictions.
According to TFA, the scatter/gather instructions are actually pseudo-instructions handled by the assembler on the current version of Larrabee.
This isn't really x86, in my opinion; it's x86 with a separate set of very obviously graphics-oriented instructions bolted on top. Since getting decent performance will require using the new instructions and a new programming model almost exclusively, what's the point of the x86 bit?
The point is that there's stuff those graphics-oriented instructions are really not very good at, like indirect memory referencing and branching logic, both of which x86 excels at handling. Now, that kind of workload isn't common on GPUs _at the moment_, but both of those are common operations, for example, in ray tracing, so you may see them become more important over the next few years. What Intel are doing here is defining the GPU architecture for the next decade, and it's one that allows more complex algorithms to be implemented than can easily be done using the specialized stream processing systems we have at the moment.
The other point behind the x86 bit is that not only did Intel alrady have core designs that implemented it (Larrabee simply has the new registers & instructions bolted on to an existing low-power Pentium-class core) thus enabling faster time to market than if they'd developed entirely new hardware, they also have a massive amount of software support for the architecture, including one of the best optimizing C++ compilers there is. A new ISA would have required a new compiler, thus further complicating the project. As it is, only extensions to their existing compiler have been necessary.
Oddly enough your post ranks quite highly in that search. Drilling through the forums that show up reveal speculation that a 32-core Larrabee design will use 300W TDP, or roughly 10W per core. There doesn't seem to be any justification for that number although the Larrabee looks like Atom + stonking huge vector array. The Atom only uses 2W, it seems hard to believe that the 16-way vector array would use as much power for each FLOP as the entire Atom power budget to deliver that FLOP. Or perhaps it will, it's all just speculation at this point.
So that 32-core processor would deliver 16x32 = 512 FLOP/clock peak. I would guess that they could deliver a low-power part clocked at 1GHz judging by the efficiency of Intel's floating point units across the whole range (from Atom up to i7). That part would hit 512GFlop/s peak. Then it's just a guessing game of what clock-speed they could ramp it up to within that 300W TDP, 2Ghz? 3?
The real killer could be how much sustained throughput can be achieved on an x86 derivative. The Core-2 sustained throughputs were mental, but it used every OoO trick that Intel could throw at it. Without that advantage the peak:sustained ratio will be closer to AMD/Nvidia's current offerings.
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
wtf does the international school of amsterdam have to do with this?
What I wonder is why they haven't attempted to release two versions: an x86 version, and a stripped down RISC version without the x86 decoder.
If you looked at what Intel has been doing recently, the RISC code that x86 is translated to has been slowly evolving. For example, sequences of compare + conditional branch become a single micro op. Instructions manipulating the stack are often combined or not executed at all. So what is the perfect RISC instruction set today isn't the perfect RISC instruction set tomorrow. And Intel's RISC instruction set would likely be quite different from AMD's.
I'm gonna go ahead and agree with management that maintainability is more important than any other factor. Having had to maintain a few ancient codebases is my day, I've seen way too many "clever" coders that do ridiculous tricks to save time or space. Well designed (read: maintainable) code does not imply any significant performance hit.
Like,... if I like the musty smell of men, does that mean I'm gay?
Either that, or you're French.
If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor!
I don't see why it should be one or the other - maintainability is important, as is using optimal algorithms. Fast algorithms can still be written in a clear and understandable manner.
Unless you are interested in a pretty small class of problems, the inherent parallelism of most applications continues to be somewhere in the range 2.1 to 2.5 (i.e., you can speed them up by a little over 2x with the addition of more processors). Thus, in most real-world applications, most of those cores, or vector units, or any other "supercomputer" features will go unused.
If anyone here observes a quad-core chip running any particular load anywhere close to 4x the speed of a single core should write a paper about it, because this has been the holy grail of parallel computing for going on 40 years now.
That Intel thinks this is a solution is sadly typical -- the problem is a software one, not a hardware problem, and they do not know how to solve it.
If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor! (smart code that actually executes quickly is generally too difficult for the dumb-arsed upper (management)
No, we (management) understand your shifty obfuscated code just fine. It's just that you are too stupid to grasp basic economics.
One week of a developer's time, fully allocated, costs the same as a decent app server. So optimizing code for maintenance is far more cost effective than optimizing for performance.
As a former Cray employee I find it interesting to see that Intel's previously unannounced deal with Cray is finally starting to deliver the goods. Intel should just get it over with and buy Cray. They've wanted back into the supercomputer business for while now anyway.
Larrabee's support for Direct3D and OpenGL will determine its course of life. The reason is twofold. First, whether Larrabee-only games and applications (those that brig out the full functionality of Larrabee) get written will depend on its popularity. Second, for Larrabee to be adopted, it needs to support existing libraries like Direct3D and OpenGL. 1) The first part should be fairly obvious - it takes time to make a game or an application, so developers need to be invested to make one for Larrabee, which will not come to them until Larrabee has some share of the graphics-hardware market. 2) The second part is true because Larrabee-only games and applications do not exist yet. Consumers will not spend 200-300 dollars on a piece of hardware only because of its specifications. They will need to see its applications to buy it. And since no application with Larrabee in mind has been written yet, it will need to support existing applications, and they all use existing libraries such as Direct3D and OpenGL. Thus, Larrabee's success depends on its support for Direct3D and OpenGL, and Intel developers developing drivers for Larrabee will be responsible for it.
compiler writers need to get their collective arses into gear for this to happen
There's a limit to how much general purpose C/C++ code can be sped up automatically; C/C++ semantics just don't allow a lot of optimizations.
Awww how sweet. Henk has registered a name troll just for me. Poor guy, that's a lot of issues for such a sweet child to carry around.
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Oh. You mean like restrict which has been in the C standard for 10 years?
Sorry, but "restrict" is not sufficient. Fortran has built in support for vectorization, parallelization, and efficient dynamic multi-dimensional arrays.
You don't have to use separate functions in C. C does have GOTO's, thank you very much :)
The subset of C enforced by many employers' coding standards lacks the goto keyword.
I'd be happy if someone could tell me a better way
If you can't get your boss to amend the coding standards to allow use of goto to handle exceptions in C, the better way involves leaving your employer. But that isn't practical in this recession.
If you watch large teams of programmers, the managment actually force the developers to write slow code, claiming that maintainability is more important than any other factor!
I don't see why it should be one or the other - maintainability is important, as is using optimal algorithms. Fast algorithms can still be written in a clear and understandable manner.
Up to a point, then you've got to make a choice. Keep the high level OOP constructs, or flatten it out to make the compiler's job easier.
THEN you have the next level of optimization, keep the readable code or do it the "clever" way that nets a 40% boost. And as any experienced coder will tell you, clever code is the antithesis of maintainable.
I have often wondered if a more orthogonal superset of the existing instruction set with a clean instruction encoding would be better. The JIT compiler would be dead simple since it would only have to translate. AMD apparently had that option when they designed x86-64 but decided full compatibility was more important.
Sorry, but "restrict" is not sufficient.
For the most part, it is. There are some other minor things, but restrict closes the majority of the performance gap between C and Fortran. Oh yes, and C99 has some truly retarded pedantery wrt. complex arithmetic, so you might need to use some compiler option to get around that.
Fortran has built in support for vectorization, parallelization, and efficient dynamic multi-dimensional arrays.
If you're thinking of FORALL and the other stuff imported from HPF, well, there's a reason HPF died and parallel Fortran applications use MPI and/or OpenMP just like C/C++. Unfortunately the semantics of FORALL and array expressions makes them not much simpler to analyze for dependencies than normal loops, and the penalty for failing is higher.
However, the arrays in F90+ are very nice to program with; compared to those, programming with arrays in C is like poking your eyes out with a fork. But that's mostly a convenience feature, as such it doesn't improve performance (as long as you don't do your multidimensional C arrays in the popular but suboptimal array-of-arrays style).
If you have an irrational hatred for Fortran, at least do yourself the favor of using C++ where you can encapsulate multidimensional arrays in a class with overloaded operators. There's a bunch of such high-performance implementations around, such as Eigen, blitz, boost::multiarray and so forth, so no need to reinvent the wheel.
I read all this multicore sales pitch as just whining about not being able to deliver faster cores in todays CPUs. Having a couple of cores at hand is nice on a deskop. Having four on a server is nice. But, most workloads arent easily ran on multiple cores. Virtualization wont have that much help from a 16 core chip since the I/O subsystem in a normal server will be long overused before the you have stressed an 8-way CPU to the max in most cases.
What we need is faster CPU-cores, not more of them. Since neither Intel nor AMD can deliver that they are trying their best making people believe what they really need is more cores and that the software people are the ones who has hit the wall. Its just an intricate blame-game where the real issue is that Mores law has slammed into a concrete wall in 200Mph. The upgrade threadmill is on the verge of slowing down and we cant have that can we?
HTTP/1.1 400
The "restrict" keyword gets you most of the Fortran advantages in C, though a lot of people misunderstand what it means.
The C++ folks are workinmg on a memory model specification intended to open up parallelism. I haven't kept up on the details of that, though.
OpenCL is a big ugly hack meant to provide a standard API for legacy GPUs. It's totally inappropriate for something like Larrabee, which is much more general purpose. A good vectorizing compiler will be able to make use of most of Larrabee's features directly. You'll be able to write code in standard languages, with an eye toward writing in a way that makes vectorization possible. While this does require that programmers get trained to understand things like loop dependence, it doesn't require learning a whole new language and API.
APL is the original matrix computing language, since morphed into J and K. Why handle just one number/character at a time? tOM
Epitaph: At last! Root access!
Your comment is funny, except that it isn't. Coders who understand really very little about the history and how hard really smart people have tried and failed think they are smarter than everyone else, including their managers, who are interested in stupid things like maintainability--even at the expense of the egos of cosmically all-knowing coders.
You could put a hypervisor on a lower level for this functionality, but that brings its own can of worms, including figuring out how to pass communication from hardware devices to the operating systems. For example, which OS should set the proper settings on a USB toaster?
If you watch large teams of programmers, the management actually force the developers to write slow code, claiming that maintainability is more important than any other factor!
I've worked in a couple of companies like that - usually the programmers were limited to working on technology that the management (ex-programmers) were familiar with. Then also, management didn't want the programmers learning "high-demand skills" (ie. hardware programming) that would boost the chances of their staff leaving to a better paid environment. Or there was the politics of favoritism where the directors wanted to give a leg up the seniority ladder to their best mate. Everyone else who was qualified "didn't have the skills or was busy on another project" while of course their mate "had applied at just the right time with the right skills". Another problem was that if management gave only one programmer a new hardware system, then everyone else would get cheesed off that they were falling behind that they would leave (eg. a CPU porting project). Alternatively, there are also quota based systems which would piss off one nationality off another.
Invariably these companies gain a bad reputation and implode after a slow death spiral, where they are forced to lay off staff and sell off equipment to cover debts. With fewer staff, they can't take on new projects, and the cycle continues until the last project is cancelled.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
For the most part, it is.
No, for the most part it isn't.
If you're thinking of FORALL and the other stuff imported from HPF, well, there's a reason HPF died and parallel Fortran applications use MPI and/or OpenMP just like C/C++.
Yes, the reason is that FORALL and parallel matrix operations vectorized code, while OpenMP is for multicore code.
There's a bunch of such high-performance implementations around, such as Eigen, blitz, boost::multiarray and so forth, so no need to reinvent the wheel.
All those libraries are excessively complex, hard to use, and have limited functionality compared to Fortran arrays. They often also don't perform well. And since C++ does not support "restrict", none of them can even communicate pointer restrictions to the compiler.
So, with the "restrict" keyword and a lot of effort, you can get some parallelization out of a C compiler; for C++, all you get is no restrict keyword and complicated libraries. Altogether, neither C nor C++ are good choices for numerical programming.
First off, there is no such language as "C/C++".
Secondly, there is one clear advantage of C++ over C here: Vector operations can be exploited by library writers in a seamless way. Someone could, for example, rewrite std::valarray to exploit the new instructions fairly easily, and existing programs which use it will Just Work(tm).
Having said that, it's more likely that the first libraries to use them will be those based on BLAS.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
"First off, there is no such language as "C/C++"."
Now, see, that just makes you sound like an idiot, trying to be clever. If you were aiming for humour, swing/miss. Next you gonna tell us there's no such word as 'swing/miss'?
Jeez.
The revolution will not be televised... but it will have a page on Wikipedia
Until the hardware shows up at independent review sites and lives up to the rather over-the-top claims, this is all hype and FUD. As long as your current GPU provides 60fps on the games you want to play at your monitor's resolution, everything else means nothing.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
>If Intel are smart they will release a chip containing one core (or 2 cores) from some kind of lower-power Core design and a pile of Larabee cores on the one die along with a memory controler
*Ahem*, what about memory bandwith??
One strong point of GPU is that they have big memory bandwith at a cheap cost as they use a fixed memory setup, if you put both the CPU and the GPU under the same memory controler with replaceable memory then it's quite likely that the GPU will suffer from the lack of memory bandwith.
I heard that Intel acquired some PowerVR IP, probably because tile-based rendering is set to use less memory bandwith than normal rendering, that said PowerVR's videocards were also less powerful than their competition..
Well, it's correct that 'restrict' is part of the C99 standard. But:
a) it is not supported by all C compilers.
b) it is not in the C++0x standard.
c) people don't use it because it is cumbersome to do so (having to type 'restrict' at each and every turn...)
On the other hand, pure functional languages are in 'restrict' mode by default, and so is FORTRAN.
No, for the most part it isn't.
Yes, it is. Look, C and C++ weren't designed with parallelism in mind, but neither was Fortran. The only parallel dialect of Fortran that saw some use was HPF, and that was a failure (see below). Back in the days of yore when vector supercomputers and dinosaurs roamed the earth, Fortran had a big advantage over C. This was largely due to the semantics of C which didn't have restrict at the time, as well as the relative immaturity of C compilers at that point. But all that code that the legendary Cray vectorizing Fortran compiler compiled was just standard F77, which had no FORALL nor array expressions. Vectorization was done on normal DO loops. Just like a modern C or C++ compiler can vectorize for-loops, possibly with the aid of restrict (or __restrict, #pragma noalias or whatever the extension is called in your C++ compiler)
"If you're thinking of FORALL and the other stuff imported from HPF, well, there's a reason HPF died and parallel Fortran applications use MPI and/or OpenMP just like C/C++." Yes, the reason is that FORALL and parallel matrix operations vectorized code, while OpenMP is for multicore code.
HPF is a failure in many respects. Fundamentally it died because on distributed memory machines it's performance was poor compared to MPI. The parts of HPF that were included into F95, most notably FORALL, are widely considered to be mistakes. As I mentioned in my previous point, the semantics of FORALL make analyzing it no simpler than analyzing the equivalent DO loop, and the performance penalty for failing is higher. Hence it's always better to just use a normal DO loop. In fact F2008 is adding a "DO CONCURRENT" loop so that the programmer can explicitly tell the compiler there are no loop-carried dependencies; this is a direct response to the difficulty of properly optimizing FORALL.
As for F90 array expressions, they have similar optimization issues as FORALL, so if one is really concerned with maximum performance it's safer to use DO loops. The real value of the array expressions is that they make the code simpler and shorter, not that they provide better performance than normal loops.
Until you ned to make a minor change to the code and find out you have to redevelop a module from scratch because the people responsible for it optimized it until nobody but the original developer (who has since left the company) has a chance of understanding the code in a reasonable timeframe.
Of course this can be avoided through really thorough documentation, but few people are enthusiastic about documenting every step and then documenting the optimizations applied to it.
USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
Had MichaelSmith's post actually been modded Funny in those two and a half hours, I might have kept my post to 140 characters or less. But in some cases, goto increases speed (due to fewer variables spilled to the stack) without diminishing maintainability, and coding standards that exclude all uses of goto based on a misinterpretation of a 1968 article by Edsger Dijkstra are one of my pet peeves.
Come on, stop putting up straw men. Fact is, C has "restrict" but no usable multidimensional arrays, and C++ doesn't have "restrict" at all. Fortran has restricted pointers and full multidimensional arrays. What more is there to say?
As for the other points your raise, your views are rather simplistic and narrow, but there's no point in debating that with you further.
GCC supports it for C++ too. I'd be suprised if ICC and VS didn't support it for C++ too.
If by "VS" you mean Visual C++, then it doesn't support C99 at all - and, consequently, doesn't have "restrict" in either C or C++ mode.
It does have __restrict and __declspec(restrict) though, in either mode, but with slightly different semantics (core optimizations enabled by it are still the same, of course).
No, I know both languages fairly well, I know they're almost incomparable and am constantly annoyed by people who assert that C++ is just C with a few bits added.
It's a little-known fact, for example, that C++ has rules which say that the compiler can assume that two pointers aren't aliased under certain circumstances, where C compilers can't make that assumption because of C's weaker type system. Pointer aliasing is one of those issues that can prevent the compiler from generating SIMD or vector instructions.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
Retired? Really? Not a week ago. He may have given the CEO reins over to our favorite chair tosser, but he's still Chairman of Microsoft. No doubt his stock option package is quite good.
That's good for Microsoft, too. Three nines of companies don't long survive the loss of their founders. As Damon Runyon said, "The race may not always be to the swift, nor the battle to the strong, but that's the way to bet".
The fall may have even begun before he retired as CEO. When SCO's backstop with Baystar dried up, Microsoft lost all of its credibility in the smoke filled rooms where the real money makes deals. Who knows how much this cost RBC and the other partners? Gates will spend the rest of his life trying to make amends, but those who suffered will never forget. You can't swing a billion dollars without somebody dies, and the dead stay dead no matter how many soup kitchens you volunteer in afterward.
Eventually, pigeons come home to roost. The devil will have his due.
Help stamp out iliturcy.
"am constantly annoyed by people who assert that C++ is just C with a few bits added"
No one was asserting such a thing here. The poster was talking about "general purpose C/C++ code" which is as perfectly valid as, for example, someone talking about website scripting in perl/php, also two completely different languages, but with overlap of what they can be used for, and it's that overlap that's referenced as the subject. However different they can be, C/C++ have undeniable massive overlap. Grouping and generalising are perfectly valid legal communicating constructs completely necessary in the conveying of ideas. Complaining that differences between elements within a generalisation are omitted, when the differences fall outside the context of what is being discussed, is basically just waving a big "missed the point" flag, which happens... but it doesn't have to be accompanied by abrasion.
The revolution will not be televised... but it will have a page on Wikipedia
95% of code doesnt need to be top notch in speed, but if your code otherwise is utter god slow, ie, do one operation take 12 minutes where another app takes 2 seconds, then you have some serious shit coders.
Also if your optimization/geewiz app takes 29hrs to run 24hrs of data, then its obviously not going to get any customers.
Point is, make it fast where it matters, not everywhere. Just look at itunes, is that written in javascript or quicktime scripting?
Document and document well if your doing tricky speed optimizations, or keep both slow and fast code together with a flag to run X or Y.
Liberty freedom are no1, not dicks in suits.
You always have to pick your priorities. Choosing maintainability is good for bug hunting and repair. Beyond that, you can choose speed, disk space, memory footprint, or any number of other factors. Where I work, maintainability is very important, so we use Perl. A whole lot of what I do would be an order of magnitude faster if it was done in C, but not everybody here knows C (and we're not a software shop, so that's not a job requirement). Heck, even in Perl, there are dozens of ways to do things, and there are even different ways to write the same exact algorithm with the same readability but one is faster than the other due to how it's interpreted.
There are situations where maintainability is less important than other priorities, but it's clear that slow, easy to read (and debug and fix) has its place in the market.