There was quite a bit of discussion about this over at aces hardware. The consensus seems to be that it isn't about reduced latency (shave off a few dozen ns for I/O, who cares), but about cost for low end (no need for a HT PCIe bridge in the chipset for single socket systems) and on the higher end freeing the third HT link for coherency traffic on multisocket systems.
Ubuntu does provide a build of wpasuplicant (latest version is 0.3.8, I believe), which provides WPA support.
Also, if you use cards based on the rt2[45]00 chipset, there is an opensource driver with wpa builtin. While the driver works nicely, it is not included by in hoary, but maybe the next version.
We have an Opteron compute cluster, based on HP 1U 2 cpu nodes. Runs just fine with 64-bit Linux. We use the rocks cluster distribution, which is based on RHEL 3 (the newest rocks relase is based on centos 4 which in turn is based on rhel 4, due to trademark issues).
Well, after you take the "Compilers" course maybe your love for IA-64 will have, uh, dimished a bit.
The VLIW architecture is beautiful in many ways, but creating a compiler that creates fast code for an in-order VLIW processor is a seriously difficult undertaking.
While I don't consider myself an ESR fan, reading TFA, I got the impression that the one doing the trolling was in fact the interviewer. IMHO his questions are constantly trying to sucker ESR into saying something stupid (more page views => more advertising revenue?), but this time ESR manages to keep his head cool and answers pretty rationally.
...but how is 15921.0 MB/s (Opteron 4-way) more bandwidth than 76097.8 MB/s (F25K)? Your numbers seem wrong to me.
Obviously comparing a 144 cpu box to a 4 cpu one is tedious at best since you obviously don't buy them for the same tasks, but the point was that in terms of bandwidth per cpu the SPARC isn't really impressive at all (about 500 MB/s per cpu for the F25K vs. 4000 MB/s for Opteron and 2700 MB/s for power5).
For a more valid comparison, why don't you compare the SunFire V40z 4-way Opteron servers with whatever IBM and HP's 4-way Opteron offerings are?
Since the Opteron has an on-die memory controller, I think you won't see any difference between these due to different chipsets etc. From what I've heard the Sun has good diagnostics, is relatively cheap and has good support, so certainly it looks like a good buy.
Well, since you apparently are too lazy to use google, I guess I'll have to do it for you then. Here you go, champ. Or specifically, see the table here for the actual results.
There you'll see that a 24 cpu F25K scores 11631.1 MB/s in the TRIAD benchmark (you can choose COPY, ADD or SCALE instead for that matter, the results are similar), a 144 cpu F25K 76097.8 MB/s, a 4 cpu Opteron (called AMD_Opteron_848) 15921.0 MB/s, and finally a 64 cpu IBM p5-595 173564.2 MB/s. And just like I said in my previous post, those numbers show that a 4 cpu Opteron beats a 24 cpu F25K, and a 64 cpu IBM beats the highest end 144 cpu sun pretty badly.
I haven't seen any of these memory bandwidth benchmarks, but I'm pretty sure you're talking out your ass because the MINIMUM configuration for an F25K is 36 CPUs.
Well, nobody said that you have to use all the available processors when running the benchmark. Actually, looking at the email where the Sun engineer submitted the benchmark results, it looks like all the benchmarks where run on the same machine. Running with different numbers of cpu:s is actually a pretty interesting benchmark, since it shows how well the bandwidth scales with increasing cpu numbers. In this case, one can see that the bandwidth per cpu scales almost linearly for the F25K, which is a good result for a big shared memory box. Unfortunately that doesn't really change the fact that the bandwidth per processor actually isn't that impressive.
Oh, right, you were just trolling and spouting the typical anti-Sun pro-IBM Slashbot FUD.
Oh right, you were just spouting the same old "Sun computers are better since they have much more memory bandwidth than pc class computers" that was true in the early 1990:s, without actually checking whether it still holds (Hint: it doesn't).
First of all, you're right about the price. Sun servers, especially the UltraSparc line of servers tend to be much more pricey than your average x86 server vendor. They also tend to be relatively slow in CPU-speed, but make up for this in spades with I/O throughput and memory bandwidth.
Yeah? That must be why SPARCs gets pummeled so completely in the STREAM benchmark which measures memory bandwidth. For example, their current highend system (F25K) in a 24 cpu configuration is beaten by a 4 cpu Opteron. Or the IBM p5-595 at 64 cpu:s has more than twice the memory bw than a 144 cpu F25K. Go figure...
So who are we gonna believe? A peer-reviewed article written by someone who has spent months, if not years, researching the issue, or a random/. comment based on some vague feeling that very little energy is required?
Well gee, let me think...
I'm not saying that the article is flawless or even correct (e.g. it has been critizised for using obsolete data), I just think you ought to expend a little more effort before you can say whether it's correct or not.
That energy was collected from sunlight millions of years ago.
If you mean losses from refining and transportation etc., IIRC about 10-15 % of the energy content of crude oil is lost this way. That of course does not mean that 90 % of the crude oil becomes gasoline, there's LPG, diesel, kerosene, fuel oils, lubricants, asphalt etc.
Actually, the ethanol industry in Brazil is also heavily subsidized by their government (the programme is called "Pro-Alcool" if you want to google...).
Wrong. The forth computer you linked to is the earth simulator, which uses NEC SX-6 vector chips (which have nothing to do with the alpha). That's why its processors are so fast compared to scalar processors (and insanely expensive too, for that matter, which is why most supercomputers use commodity processors).
Well, looking at #10, with 5000 2.0 GHz Opterons. Rmax=15250, Rpeak=20000. Per cpu that makes Rmax=3.05, Rpeak=4.
You can calculate Rpeak from published info about the cpu as follows: the Opteron can compute two floating point operations per second, running at 2 GHz that makes 4 Gflop/s.
I read the article, and the website of the company, but I couldn't find out how you're supposed to access all this data? It's hardly practical that every node exports it's own NFS, is it? Is it supposed to use some kind of cluster file system such as (Open)GFS?
Or is the user expected to do some kind of in-house thingy, like google or (presumably) the internet archive?
Yup, another thing that sounds like straight from MMM, is the emphasis on components instead of huge monolithic applications. One of Brooks central arguments was that the cost of producing a piece of software increases faster than linearly with size due to communication overhead, with the optimal team size being around 5-10. So instead of creating one huge piece of software, make many smaller components and finally just put them together to create the final app.
Well, the Cray T3E they used to have at the supercomputer center where I submit my stuff was dismantled and the pieces thrown into a big trash bin in the yard. *sniff*
The life of a supercomputer is AFAIK really closer to 5 years than 10. It's not that they aren't impressive machines even 5 years old, it's just that they use _lots_ of power and floor space. Looking at how much computing per $ you can do, it's just cheaper to replace them with something new than to keep them running.
Re:DVORAK for real world, SysAdmin/Programming use
on
Advocating Dvorak
·
· Score: 1
... and that, dear/.:ers, is but one reason why God (TM) created languages such as Fortran and python and not C.
In case you don't remember, the point of RISC was to put optimization on the compiler so it wouldn't require massive on-the-fly speculative bibbledy-bop with millions of extra transistors and hideous pipelines like we have nowadays. This was done by providing, essentially, a compiler-accessible cache in the form of lots of registers, and by having an instruction set that was amenable to automated optimization.
Yes, at least in the beginning in its most pure form. Most high performance RISC architectures eventually adopted all those OoO, pipelining etc. tricks anyway.
In theory, you don't need any GP registers at all, you could just have memory-memory ops and rely on the cache.
Such "register-less" architectures have been researched, yes. Their primary downfall was that as the compiler has no way of knowing which memory currently happens to reside in cache (as you probably know, cache loading/eviction is decided at runtime based on the memory access pattern), the memory access time is non-deterministic. So there was no way the compilers could schedule the instructions in an intelligent way, and thus such an architecture would have to rely on some really fancy OoO scheme with a huge lookahead (=lots and lots of transistors) to get anywhere near decent performance.
The real problem seems to be that compilers have just not been able to keep up with the last 20 years of theory.
What theory? Optimizing code generation is a very hard problem, and if theory had provided some easy answer to it, the compiler vendors would have implemented it really quickly.
Witness the Itanium--in theory it should have been the ultimate, but they didn't seem to be able to get things optimized for it (other problems, too). Then what happens are curmudgeons complain about the extra work of optimization and insist on setting us back to early 80s architecture rather than writing a decent compiler.
Well it seems that we have to agree to disagree then. My opinion is that the godlike compiler you seem to think is just around the corner only if those curmudgeon compiler writers would get off their fat sorry asses, hasn't arrived because despite all compiler research we still haven't got much of a clue about how to make it.
Moral of the story: write a decent compiler and stop trying to glorify crappy ISAs that suit your antiquated and inefficient coding habits.
My moral: Write your code in a high-level portable language that isn't tied to some specific ISA. Don't get emotionally attached to ISA:s, whether positively or negatively. Judge the goodness of an architecture on how well the compiler + hardware executes the code, not on theoretical figures unlikely to be reached in practice.
Example of the above moral: Despite the supposed crappiness of the x86 ISA, it still manages pretty good performance (and in most cases unbeatable price/performance), even with a performance-wise mediocre compiler like gcc.
Probably they'll go with a proprietary bios, and I guess they're only interested in the x86-64 mode with SSE2. So in principle Intel could make a mac-specific cpu without all the legacy crap (16-bit, 32-bit modes, x87 etc.) but considering that Mac is a relatively small market, the few mm^2 chip area they'd save would almost certainly not be worth all the engineering/validation etc. costs.
There was quite a bit of discussion about this over at aces hardware. The consensus seems to be that it isn't about reduced latency (shave off a few dozen ns for I/O, who cares), but about cost for low end (no need for a HT PCIe bridge in the chipset for single socket systems) and on the higher end freeing the third HT link for coherency traffic on multisocket systems.
Jim Gettys (the X Window System guy) also got the boot from HP. It is mentioned here.
Ubuntu does provide a build of wpasuplicant (latest version is 0.3.8, I believe), which provides WPA support.
Also, if you use cards based on the rt2[45]00 chipset, there is an opensource driver with wpa builtin. While the driver works nicely, it is not included by in hoary, but maybe the next version.
We have an Opteron compute cluster, based on HP 1U 2 cpu nodes. Runs just fine with 64-bit Linux. We use the rocks cluster distribution, which is based on RHEL 3 (the newest rocks relase is based on centos 4 which in turn is based on rhel 4, due to trademark issues).
Nice. I have a pretty similar setup at home, although I use dirvish and not backup buddy. An old Ppro-200 with a couple of extra disks.
But IIRC it is somewhat cpu limited, or at least the load on the ppro box is pretty high when it is backuping, rsync and ssh use a lot of cpu.
All in all, I'm very happy with it. Beats tapes and cd/dvd-r:s and it's cheap since you can get old comps really cheaply.
Well, after you take the "Compilers" course maybe your love for IA-64 will have, uh, dimished a bit.
The VLIW architecture is beautiful in many ways, but creating a compiler that creates fast code for an in-order VLIW processor is a seriously difficult undertaking.
ESR is such a troll.
While I don't consider myself an ESR fan, reading TFA, I got the impression that the one doing the trolling was in fact the interviewer. IMHO his questions are constantly trying to sucker ESR into saying something stupid (more page views => more advertising revenue?), but this time ESR manages to keep his head cool and answers pretty rationally.
If what you said were true why didn't Microsoft take say FreeBSD and do the same thing? It's BSD Licensed, UNIX based and a pretty solid system.
AFAIK MS did in fact use the BSD network stack for some version of NT.
Obviously comparing a 144 cpu box to a 4 cpu one is tedious at best since you obviously don't buy them for the same tasks, but the point was that in terms of bandwidth per cpu the SPARC isn't really impressive at all (about 500 MB/s per cpu for the F25K vs. 4000 MB/s for Opteron and 2700 MB/s for power5).
For a more valid comparison, why don't you compare the SunFire V40z 4-way Opteron servers with whatever IBM and HP's 4-way Opteron offerings are?
Since the Opteron has an on-die memory controller, I think you won't see any difference between these due to different chipsets etc. From what I've heard the Sun has good diagnostics, is relatively cheap and has good support, so certainly it looks like a good buy.
Show me proof.
Well, since you apparently are too lazy to use google, I guess I'll have to do it for you then. Here you go, champ. Or specifically, see the table here for the actual results.
There you'll see that a 24 cpu F25K scores 11631.1 MB/s in the TRIAD benchmark (you can choose COPY, ADD or SCALE instead for that matter, the results are similar), a 144 cpu F25K 76097.8 MB/s, a 4 cpu Opteron (called AMD_Opteron_848) 15921.0 MB/s, and finally a 64 cpu IBM p5-595 173564.2 MB/s. And just like I said in my previous post, those numbers show that a 4 cpu Opteron beats a 24 cpu F25K, and a 64 cpu IBM beats the highest end 144 cpu sun pretty badly.
I haven't seen any of these memory bandwidth benchmarks, but I'm pretty sure you're talking out your ass because the MINIMUM configuration for an F25K is 36 CPUs.
Well, nobody said that you have to use all the available processors when running the benchmark. Actually, looking at the email where the Sun engineer submitted the benchmark results, it looks like all the benchmarks where run on the same machine. Running with different numbers of cpu:s is actually a pretty interesting benchmark, since it shows how well the bandwidth scales with increasing cpu numbers. In this case, one can see that the bandwidth per cpu scales almost linearly for the F25K, which is a good result for a big shared memory box. Unfortunately that doesn't really change the fact that the bandwidth per processor actually isn't that impressive.
Oh, right, you were just trolling and spouting the typical anti-Sun pro-IBM Slashbot FUD.
Oh right, you were just spouting the same old "Sun computers are better since they have much more memory bandwidth than pc class computers" that was true in the early 1990:s, without actually checking whether it still holds (Hint: it doesn't).
First of all, you're right about the price. Sun servers, especially the UltraSparc line of servers tend to be much more pricey than your average x86 server vendor. They also tend to be relatively slow in CPU-speed, but make up for this in spades with I/O throughput and memory bandwidth.
Yeah? That must be why SPARCs gets pummeled so completely in the STREAM benchmark which measures memory bandwidth. For example, their current highend system (F25K) in a 24 cpu configuration is beaten by a 4 cpu Opteron. Or the IBM p5-595 at 64 cpu:s has more than twice the memory bw than a 144 cpu F25K. Go figure...
So who are we gonna believe? A peer-reviewed article written by someone who has spent months, if not years, researching the issue, or a random /. comment based on some vague feeling that very little energy is required?
Well gee, let me think...
I'm not saying that the article is flawless or even correct (e.g. it has been critizised for using obsolete data), I just think you ought to expend a little more effort before you can say whether it's correct or not.
That energy was collected from sunlight millions of years ago.
If you mean losses from refining and transportation etc., IIRC about 10-15 % of the energy content of crude oil is lost this way. That of course does not mean that 90 % of the crude oil becomes gasoline, there's LPG, diesel, kerosene, fuel oils, lubricants, asphalt etc.
Actually, the ethanol industry in Brazil is also heavily subsidized by their government (the programme is called "Pro-Alcool" if you want to google...).
Your boss might be less than happy when you start printing it all, though. ;-)
Umm yes, two flop:s per clock cycle. => At 2 GHz, that is 2e9 clock cycles per second you get 4 Gflop/s peak.
s/forth/fourth/
Wrong. The forth computer you linked to is the earth simulator, which uses NEC SX-6 vector chips (which have nothing to do with the alpha). That's why its processors are so fast compared to scalar processors (and insanely expensive too, for that matter, which is why most supercomputers use commodity processors).
Well, looking at #10, with 5000 2.0 GHz Opterons. Rmax=15250, Rpeak=20000. Per cpu that makes Rmax=3.05, Rpeak=4.
You can calculate Rpeak from published info about the cpu as follows: the Opteron can compute two floating point operations per second, running at 2 GHz that makes 4 Gflop/s.
I read the article, and the website of the company, but I couldn't find out how you're supposed to access all this data? It's hardly practical that every node exports it's own NFS, is it? Is it supposed to use some kind of cluster file system such as (Open)GFS?
Or is the user expected to do some kind of in-house thingy, like google or (presumably) the internet archive?
Yup, another thing that sounds like straight from MMM, is the emphasis on components instead of huge monolithic applications. One of Brooks central arguments was that the cost of producing a piece of software increases faster than linearly with size due to communication overhead, with the optimal team size being around 5-10. So instead of creating one huge piece of software, make many smaller components and finally just put them together to create the final app.
Well, the Cray T3E they used to have at the supercomputer center where I submit my stuff was dismantled and the pieces thrown into a big trash bin in the yard. *sniff*
The life of a supercomputer is AFAIK really closer to 5 years than 10. It's not that they aren't impressive machines even 5 years old, it's just that they use _lots_ of power and floor space. Looking at how much computing per $ you can do, it's just cheaper to replace them with something new than to keep them running.
... and that, dear /.:ers, is but one reason why God (TM) created languages such as Fortran and python and not C.
In case you don't remember, the point of RISC was to put optimization on the compiler so it wouldn't require massive on-the-fly speculative bibbledy-bop with millions of extra transistors and hideous pipelines like we have nowadays. This was done by providing, essentially, a compiler-accessible cache in the form of lots of registers, and by having an instruction set that was amenable to automated optimization.
Yes, at least in the beginning in its most pure form. Most high performance RISC architectures eventually adopted all those OoO, pipelining etc. tricks anyway.
In theory, you don't need any GP registers at all, you could just have memory-memory ops and rely on the cache.
Such "register-less" architectures have been researched, yes. Their primary downfall was that as the compiler has no way of knowing which memory currently happens to reside in cache (as you probably know, cache loading/eviction is decided at runtime based on the memory access pattern), the memory access time is non-deterministic. So there was no way the compilers could schedule the instructions in an intelligent way, and thus such an architecture would have to rely on some really fancy OoO scheme with a huge lookahead (=lots and lots of transistors) to get anywhere near decent performance.
The real problem seems to be that compilers have just not been able to keep up with the last 20 years of theory.
What theory? Optimizing code generation is a very hard problem, and if theory had provided some easy answer to it, the compiler vendors would have implemented it really quickly.
Witness the Itanium--in theory it should have been the ultimate, but they didn't seem to be able to get things optimized for it (other problems, too). Then what happens are curmudgeons complain about the extra work of optimization and insist on setting us back to early 80s architecture rather than writing a decent compiler.
Well it seems that we have to agree to disagree then. My opinion is that the godlike compiler you seem to think is just around the corner only if those curmudgeon compiler writers would get off their fat sorry asses, hasn't arrived because despite all compiler research we still haven't got much of a clue about how to make it.
Moral of the story: write a decent compiler and stop trying to glorify crappy ISAs that suit your antiquated and inefficient coding habits.
My moral: Write your code in a high-level portable language that isn't tied to some specific ISA. Don't get emotionally attached to ISA:s, whether positively or negatively. Judge the goodness of an architecture on how well the compiler + hardware executes the code, not on theoretical figures unlikely to be reached in practice.
Example of the above moral: Despite the supposed crappiness of the x86 ISA, it still manages pretty good performance (and in most cases unbeatable price/performance), even with a performance-wise mediocre compiler like gcc.
Probably they'll go with a proprietary bios, and I guess they're only interested in the x86-64 mode with SSE2. So in principle Intel could make a mac-specific cpu without all the legacy crap (16-bit, 32-bit modes, x87 etc.) but considering that Mac is a relatively small market, the few mm^2 chip area they'd save would almost certainly not be worth all the engineering/validation etc. costs.