This. Completely. Or just quit and start your own company. Life to way too short to spend it all working for someone. I'm checking out at age 53 next year, and people look at me like I'm from another planet for not wanting to work into my 70s. Fuck that.
There are a zillion thread classes for C++, I've written three myself. Just pick one and go with it, or just use pthreads directly, if you want/need to use C++. You don't have to wait for the next standard. Java's threads suck for a lot of things, particularly if you need lightweight threads. If you're using Mac OS X, the Cocoa NSThread class rocks, it's fast and easy. I don't really get what the big deal is here. People have been writing multithreaded code for a couple of decades.
Mathematica is quite good at linear algebra, actually. Not such a great group theory tool, but I have used it for a lot of number theory projects. I paid $75 for a Mathematica license (as a grad student) and it's definitely worth that much to me. It is a very nice tool for a lot things, but not everything. When I need to do a group theory calculation, I use GAP. When I need to do some complicated commutative algebra calculation, I use Macaulay2. I think SAGE is cool and all, but they are all just tools to do research, and I'll use whatever I can get my hands on to get the job done as fast as possible. Also, Mathematica as a programming language is really more of a pure functional language that is every bit as flexible as your "real" programming languages. If I need to use a "real" programming language, it is for speed, in which case C++ is much faster than Python and all those "real" programming languages.
One objective benchmark to consider when comparing clusters vs. a server or mainframe is the TPC-C online transaction processing benchmark. http://tpc.org/tpcc/results/tpcc_perf_results.asp? resulttype=all Clusters get beat here in both absolute performance and performance/price, with the first cluster in the top ten being a cluster of 4-way HP Itanium servers. Granted this benchmark may or may not be relevant to what you care about, but it is definitely a very clear objective test for online transaction processing.
Gray is a hurricane meteorologist, not a climate scientist. He is definitely a contrarian when it comes to anthropomorphic global warming, but he's also way out of his area of expertise. His methods for "debunking" the current state of climate science have more to do with his opinions than sound scientific reasoning and methods.
I ran a couple of HP servers (IA64) for a couple of years, one with Windows Server 2003, one with RHEL Linux. They both worked well, but the Windows machine required fewer reboots and was more stable. Remember that Windows Server 2003 is NOT XP, it was designed for uptime and can add patches on the fly without downtime. There are a lot of big IA64 servers out there running Windows Server 2003, especially at large telecoms, where uptime is paramount, and they do the job.
The sizes of transforms they are using for comparison here are of lengths of the order of 1 million points. This is huge for an FFT, and truncation error will definitely come into play here using only 32-bit precision. It all depends on what you are doing whether this will be adequate or not. Also, it's not at all clear what they did on the other platforms. There are some tricks to doing very long sequences; essentially using a 2D transform to perform a long 1D transform. It's not trivial, and requires some extra work, but generally a lot more efficient than taking a 1D transform and shoving a 4 million element transform into it. The inner loops of a 1D transform will eventually trash the cache for such a large transform, so using a blocked 2D transform avoids this, with some overhead of course. It's hard to tell what they are doing from the performance curves, since they report seconds, and it needs to be scaled by n log n to really see what's going on. It's cool they tried this, though. I was looking at using a GPU to do FFTs and linear algebra kernels a couple of years ago, but decided not to go there as I didn't think it would pay off; mostly because of the 32-bit precision.
The authors discuss hand tuning and assembler coding for Cell, but not necessarily for the other processors. Their 2D FFT results, for example, are a factor a 10 slower than others I have seen. Also, for the IA64 and Opteron, the performance many of these numerical kernels are highly dependent on the compiler used. The IA64 especially is very sensitive to compiler optimization to keep the 6 pipeline slots busy and also generate memory prefetch instructions at the right time to prevent stalling. As often seems to occur in these sorts of HPC comparisons, they spend a lot of time hand opitmizing for a particular platform, and compare it to other platforms that have not necessarily received the equivalent effort. As has been noted above, how much time you have to spend developing, debugging, and tuning a code matters a lot. This is particularly true for research codes. Finally, who uses single precision for scientific computing anymore? Any field that I am aware of that would use large FFTs, large linear algebra solvers, etc. requires at least double precision to get anything meaningful.
I have a Motorola bluetooth phone (V330, which they stopped making last year) and it syncs awesomely with my Powerbook via Bluetooth. It works quite well for voice as well, amazingly enough. The one thing I have not tried is using it as a modem. There is a USB cable available for $20, and the T-Mobile internet service is cheap ($6/month) and unlimited...but I have no idea if it works as a modem. Anybody know?
No, it's not just a matter of posture and ergonomics. I was a programmer for 13 years, before I essentially wore out the tendons in my forearms. Now, I can only code for an hour or two tops per day - I'm not programming for a living anymore. When I could no longer sleep well because my arms hurt so much, I finally talked to the worker's comp person at my company. The doctors confirmed I had severe tendonitis, and recommended rest and physical therapy. It helped, but after several months, I was only marginally better. The doctor and PT both told me I would need to get cortisone injections to continue coding full time. I refused to do that, and quit shortly thereafter. I had made a very good living for many years, which afforded me the ability to do something else. After 3 months or so, my arms finally stopped hurting all the time, and I felt back to normal. A few times I have tried to go back and code up some ideas I had, and after just a few hours it comes back, confirming what the doctor told me: I wore out my arms. It's really a blessing in disguise, I'm much happier not sitting in front of a screen all day. Looking back at it all, I think coding for more than 6 hours a day is just unhealthy. Had I never pushed myself so hard for so many years, I think I could have kept going, but the damage was done.
This argument that the dual core is overkill, if pushed all the way through, would advocate using thin clients instead of a PC. If this is true, then we would see widespread use of thin clients throughout (which we don't). Just wait, Vista will be a boon for the dual core processor (with lots of RAM).
I agree. It's just another "computer stunt" paper. The supercomputer centers love these things, as they (sort of) justify their existence. I think if they tore apart these big parallel machines and gave a small piece to small research groups around the country, a lot more science would get done. It takes a _lot_ of simulations to really learn anything, not just one moon shot hero run. The idea of these grand challenge computational problems soaking up all of the resources is so 80's. You can load up a 2-way dual core (for 4 cores) system (say, Operton or G5) and load up 16GB of RAM and get a lot of science done. This was even true more than 10 years ago: it was faster to run on a RISC workstation than to submit a job to a remote CRAY somewhere, where it sits in a queue for a few hours of cpu time. And debugging was a lot easier, too. I do think there are a few problems that really do require a large, dedicated system - but not very many.
Re:Could you at least TRY to get the story right?
on
No EFI Support for Vista
·
· Score: 3, Informative
>If Apple actually comes out with a 64-bit machine (like most modern PCs), I'm sure 64->bit Vista will boot on it just fine.
Apple does have a 64-bit machine, the G5. It seems to me that the Core-Duo Intel Macs are just a stopgap until the next Intel Core processors are released in the second half of this year, which are 64-bit. If anything, this is Intel's fault for not starting the Core architecture as a 32-bit platform, then moving to 64-bit for the second rev.
There are two issues here: using a single or mixed language approach for a particular application, and using a single languange (or not) across an entire organization for all projects.
I worked for a long time at a big national lab that was mostly a FORTRAN shop. They wanted to use FORTRAN for everything, and it was technically a bad choice for everything, but culturally it was the only solution that would fit without causing a jihad among the old timers. I much prefer C++ for these sorts of things (big complex simulations that must run fast), but had little success in converting the masses, even though it was always faster, more portable, much easier to maintain and handle complexity, and also you can actually hire good C++ programmers.
We were able to do some mixed language solutions (C++, FORTRAN, C, perl, etc.) and they were a nightmare to maintain. in hindsight, I think it would be better to keep the apps all in one language rather than mixing. The biggest problem here is portability. These applications have incredibly long lifecycles, and the platforms change severals times underneath you, which seems to affect the inter-language interfaces the most.
Anyway, it depends a lot on the type of application, lifecycle, target platform(s), etc. but I think in general it is best to pick a single language if at all possible for a particular application that is the best single tool for the job. But, if a different application would be better suited for a different language, go with the different language. Mandating a single language policy across an organization for all projects is counterproductive: use the right tool for the job.
Intel is shipping 65nm parts NOW. Why is it a big deal that IBM is going to do it next year? Intel already announced that they have produced 45nm parts that will be out in 2007. Also, IBM is not producing a 5+ GHz dual core processor that will fit in a laptop without melting it...Apple made the right choice. Everything with IBM is always at least next year.
There is definitely a place for high-end server processors, and for the last few years IBM and Intel have been leapfrogging each other with the POWER4/5 and IA64. Unfortunately for Intel, when the POWER5 appeared, IA64 was left behind. The TPC-C benchmark http://tpc.org/tpcc/results/tpcc_perf_results.asp is a big deal, and HP lost the lead when the POWER5 servers appeared. My guess is that HP+Intel (along with NEC, Unisys, Bull, and others who also make >= 32-way IA64+Windows Server machines) want the IA64 to be competetive with the POWER5, hence the $10B investment. I have no doubt they can pull it off. The x86-64 processors are great for some applications, but the large transaction processing server is not one of them. If support for the IA64 is withdrawn, then IBM will be alone at the top - and they certainly are NOT abandoning big iron any time soon. There is too much money to be made in TPC.
Whatever, man. I have G5 and Itanium2 machines at my desk. The HP Itanium2 runs Linux and WinXP 64-bit edition (which came out last June). The Itanium2 (McKinley) is an old slow one that crushes the G5 easliy on everything (using Intel's compiler) by factors of 2-3x. The new Madison Itaniums are substantially faster (look at the SPEC CPU benchmarks). The Itanium is far superior to anything else out there, it just doesn't run x86 code all that fast, and the GNU compiler sucks on the Itanium because the optimzier cannot get the VLIW right. The Itanium is just ahead of its time. And most people are too stuck in the x86 mindset to even see it. CPU buyers lose as a result.
> Sure, but it doesn't really do it significantly better than > some of the more common RISC architectures (Sparc, >Power, Alpha), and it's a lot more expensive.
This is bullshit. The Itanium2 slaughters the Ultrasparc 3 by factors of 3-4, and is 25% or so better than POWER4, and is a LOT cheaper than either one of these. You call the Alpha cost effective? It can cost nearly $100K for a loaded dual processor Alpha box, and the dual Itanium2 is still faster for close to an order of magnitude less money. I have 6 Itanium2 machines and more than 30 p690 POWER4 machines at my disposal and develop computationally intensive code on all of the above and more every day. You can buy a 900Mhz Itanium2 workstation for under $5K. The Pentium4 is a better deal at under $2K, but it is about half the performance of the cheapest Itanium2, and you're stuck at 32 bits. Anyone that would buy a high-end workstation will recompile, so emulation mode performance is absolutely meaningless.
Also, go to: http://www.emsl.pnl.gov:2080/capabs/mscf/?/capabs/ mscf/hardware/results_hpcs2.html for benchmarks results for some real codes and further synthetic benchmarks.
Have you considered Itanium-2 under Linux for your "number crunching" platform? The McKinley (Itanium-2) is faster than the Power4, and also cheaper (although you'll need to buy the Intel compilers for a few hundred if you want great performance).
This. Completely. Or just quit and start your own company. Life to way too short to spend it all working for someone. I'm checking out at age 53 next year, and people look at me like I'm from another planet for not wanting to work into my 70s. Fuck that.
There are a zillion thread classes for C++, I've written three myself. Just pick one and go with it, or just use pthreads directly, if you want/need to use C++. You don't have to wait for the next standard. Java's threads suck for a lot of things, particularly if you need lightweight threads. If you're using Mac OS X, the Cocoa NSThread class rocks, it's fast and easy. I don't really get what the big deal is here. People have been writing multithreaded code for a couple of decades.
Mathematica is quite good at linear algebra, actually. Not such a great group theory tool, but I have used it for a lot of number theory projects. I paid $75 for a Mathematica license (as a grad student) and it's definitely worth that much to me. It is a very nice tool for a lot things, but not everything. When I need to do a group theory calculation, I use GAP. When I need to do some complicated commutative algebra calculation, I use Macaulay2. I think SAGE is cool and all, but they are all just tools to do research, and I'll use whatever I can get my hands on to get the job done as fast as possible. Also, Mathematica as a programming language is really more of a pure functional language that is every bit as flexible as your "real" programming languages. If I need to use a "real" programming language, it is for speed, in which case C++ is much faster than Python and all those "real" programming languages.
We need less discussion of politics and politicians and dogma, and more science: http://climatesci.colorado.edu/
One objective benchmark to consider when comparing clusters vs. a server or mainframe is the TPC-C online transaction processing benchmark. http://tpc.org/tpcc/results/tpcc_perf_results.asp? resulttype=all Clusters get beat here in both absolute performance and performance/price, with the first cluster in the top ten being a cluster of 4-way HP Itanium servers. Granted this benchmark may or may not be relevant to what you care about, but it is definitely a very clear objective test for online transaction processing.
Gray is a hurricane meteorologist, not a climate scientist. He is definitely a contrarian when it comes to anthropomorphic global warming, but he's also way out of his area of expertise. His methods for "debunking" the current state of climate science have more to do with his opinions than sound scientific reasoning and methods.
Only by the time AMD releases their 65nm stuff, Intel will be at 45nm http://today.reuters.com/news/articleinvesting.asp x?view=CN&storyID=2006-11-27T225533Z_01_N27466640_ RTRIDST_0_INTEL-MANUFACTURING.XML&rpc=66&type=qcna
IBM is screwed. Cringely http://www.pbs.org/cringely/pulpit/pulpit20060518. html has an interesting perspective, he may be right.
IBM has always performed computer stunts to rally their base of believers, but they rely far too heavily on spreading FUD rather than executing.
I ran a couple of HP servers (IA64) for a couple of years, one with Windows Server 2003, one with RHEL Linux. They both worked well, but the Windows machine required fewer reboots and was more stable. Remember that Windows Server 2003 is NOT XP, it was designed for uptime and can add patches on the fly without downtime. There are a lot of big IA64 servers out there running Windows Server 2003, especially at large telecoms, where uptime is paramount, and they do the job.
The sizes of transforms they are using for comparison here are of lengths of the order of 1 million points. This is huge for an FFT, and truncation error will definitely come into play here using only 32-bit precision. It all depends on what you are doing whether this will be adequate or not. Also, it's not at all clear what they did on the other platforms. There are some tricks to doing very long sequences; essentially using a 2D transform to perform a long 1D transform. It's not trivial, and requires some extra work, but generally a lot more efficient than taking a 1D transform and shoving a 4 million element transform into it. The inner loops of a 1D transform will eventually trash the cache for such a large transform, so using a blocked 2D transform avoids this, with some overhead of course. It's hard to tell what they are doing from the performance curves, since they report seconds, and it needs to be scaled by n log n to really see what's going on. It's cool they tried this, though. I was looking at using a GPU to do FFTs and linear algebra kernels a couple of years ago, but decided not to go there as I didn't think it would pay off; mostly because of the 32-bit precision.
The authors discuss hand tuning and assembler coding for Cell, but not necessarily for the other processors. Their 2D FFT results, for example, are a factor a 10 slower than others I have seen. Also, for the IA64 and Opteron, the performance many of these numerical kernels are highly dependent on the compiler used. The IA64 especially is very sensitive to compiler optimization to keep the 6 pipeline slots busy and also generate memory prefetch instructions at the right time to prevent stalling. As often seems to occur in these sorts of HPC comparisons, they spend a lot of time hand opitmizing for a particular platform, and compare it to other platforms that have not necessarily received the equivalent effort. As has been noted above, how much time you have to spend developing, debugging, and tuning a code matters a lot. This is particularly true for research codes. Finally, who uses single precision for scientific computing anymore? Any field that I am aware of that would use large FFTs, large linear algebra solvers, etc. requires at least double precision to get anything meaningful.
I second this. NPR does not do EVERYTHING that happens on public radio.
I have a Motorola bluetooth phone (V330, which they stopped making last year) and it syncs awesomely with my Powerbook via Bluetooth. It works quite well for voice as well, amazingly enough. The one thing I have not tried is using it as a modem. There is a USB cable available for $20, and the T-Mobile internet service is cheap ($6/month) and unlimited...but I have no idea if it works as a modem. Anybody know?
No, it's not just a matter of posture and ergonomics. I was a programmer for 13 years, before I essentially wore out the tendons in my forearms. Now, I can only code for an hour or two tops per day - I'm not programming for a living anymore. When I could no longer sleep well because my arms hurt so much, I finally talked to the worker's comp person at my company. The doctors confirmed I had severe tendonitis, and recommended rest and physical therapy. It helped, but after several months, I was only marginally better. The doctor and PT both told me I would need to get cortisone injections to continue coding full time. I refused to do that, and quit shortly thereafter. I had made a very good living for many years, which afforded me the ability to do something else. After 3 months or so, my arms finally stopped hurting all the time, and I felt back to normal. A few times I have tried to go back and code up some ideas I had, and after just a few hours it comes back, confirming what the doctor told me: I wore out my arms. It's really a blessing in disguise, I'm much happier not sitting in front of a screen all day. Looking back at it all, I think coding for more than 6 hours a day is just unhealthy. Had I never pushed myself so hard for so many years, I think I could have kept going, but the damage was done.
This argument that the dual core is overkill, if pushed all the way through, would advocate using thin clients instead of a PC. If this is true, then we would see widespread use of thin clients throughout (which we don't). Just wait, Vista will be a boon for the dual core processor (with lots of RAM).
I agree. It's just another "computer stunt" paper. The supercomputer centers love these things, as they (sort of) justify their existence. I think if they tore apart these big parallel machines and gave a small piece to small research groups around the country, a lot more science would get done. It takes a _lot_ of simulations to really learn anything, not just one moon shot hero run. The idea of these grand challenge computational problems soaking up all of the resources is so 80's. You can load up a 2-way dual core (for 4 cores) system (say, Operton or G5) and load up 16GB of RAM and get a lot of science done. This was even true more than 10 years ago: it was faster to run on a RISC workstation than to submit a job to a remote CRAY somewhere, where it sits in a queue for a few hours of cpu time. And debugging was a lot easier, too. I do think there are a few problems that really do require a large, dedicated system - but not very many.
>If Apple actually comes out with a 64-bit machine (like most modern PCs), I'm sure 64->bit Vista will boot on it just fine.
Apple does have a 64-bit machine, the G5. It seems to me that the Core-Duo Intel Macs are just a stopgap until the next Intel Core processors are released in the second half of this year, which are 64-bit. If anything, this is Intel's fault for not starting the Core architecture as a 32-bit platform, then moving to 64-bit for the second rev.
There are two issues here: using a single or mixed language approach for a particular application, and using a single languange (or not) across an entire organization for all projects.
I worked for a long time at a big national lab that was mostly a FORTRAN shop. They wanted to use FORTRAN for everything, and it was technically a bad choice for everything, but culturally it was the only solution that would fit without causing a jihad among the old timers. I much prefer C++ for these sorts of things (big complex simulations that must run fast), but had little success in converting the masses, even though it was always faster, more portable, much easier to maintain and handle complexity, and also you can actually hire good C++ programmers.
We were able to do some mixed language solutions (C++, FORTRAN, C, perl, etc.) and they were a nightmare to maintain. in hindsight, I think it would be better to keep the apps all in one language rather than mixing. The biggest problem here is portability. These applications have incredibly long lifecycles, and the platforms change severals times underneath you, which seems to affect the inter-language interfaces the most.
Anyway, it depends a lot on the type of application, lifecycle, target platform(s), etc. but I think in general it is best to pick a single language if at all possible for a particular application that is the best single tool for the job. But, if a different application would be better suited for a different language, go with the different language. Mandating a single language policy across an organization for all projects is counterproductive: use the right tool for the job.
Intel is shipping 65nm parts NOW. Why is it a big deal that IBM is going to do it next year? Intel already announced that they have produced 45nm parts that will be out in 2007. Also, IBM is not producing a 5+ GHz dual core processor that will fit in a laptop without melting it...Apple made the right choice. Everything with IBM is always at least next year.
There is definitely a place for high-end server processors, and for the last few years IBM and Intel have been leapfrogging each other with the POWER4/5 and IA64. Unfortunately for Intel, when the POWER5 appeared, IA64 was left behind. The TPC-C benchmark http://tpc.org/tpcc/results/tpcc_perf_results.asp is a big deal, and HP lost the lead when the POWER5 servers appeared. My guess is that HP+Intel (along with NEC, Unisys, Bull, and others who also make >= 32-way IA64+Windows Server machines) want the IA64 to be competetive with the POWER5, hence the $10B investment. I have no doubt they can pull it off. The x86-64 processors are great for some applications, but the large transaction processing server is not one of them. If support for the IA64 is withdrawn, then IBM will be alone at the top - and they certainly are NOT abandoning big iron any time soon. There is too much money to be made in TPC.
Whatever, man. I have G5 and Itanium2 machines at my desk. The HP Itanium2 runs Linux and WinXP 64-bit edition (which came out last June). The Itanium2 (McKinley) is an old slow one that crushes the G5 easliy on everything (using Intel's compiler) by factors of 2-3x. The new Madison Itaniums are substantially faster (look at the SPEC CPU benchmarks). The Itanium is far superior to anything else out there, it just doesn't run x86 code all that fast, and the GNU compiler sucks on the Itanium because the optimzier cannot get the VLIW right. The Itanium is just ahead of its time. And most people are too stuck in the x86 mindset to even see it. CPU buyers lose as a result.
> Sure, but it doesn't really do it significantly better than
> some of the more common RISC architectures (Sparc, >Power, Alpha), and it's a lot more expensive.
This is bullshit. The Itanium2 slaughters the Ultrasparc 3 by factors of 3-4, and is 25% or so better than POWER4, and is a LOT cheaper than either one of these. You call the Alpha cost effective? It can cost nearly $100K for a loaded dual processor Alpha box, and the dual Itanium2 is still faster for close to an order of magnitude less money. I have 6 Itanium2 machines and more than 30 p690 POWER4 machines at my disposal and develop computationally intensive code on all of the above and more every day. You can buy a 900Mhz Itanium2 workstation for under $5K. The Pentium4 is a better deal at under $2K, but it is about half the performance of the cheapest Itanium2, and you're stuck at 32 bits. Anyone that would buy a high-end workstation will recompile, so emulation mode performance is absolutely meaningless.
Why not just use one of these instead?
g /P roduct.jhtml?PRODID=118&CATID=72&index=1
http://www.smithandhawken.com/jhtml/site/catalo
Zero emissions, mulching, no batteries or
electronics to break, and you get exercise
while mowing the lawn!
Here are the SPECfp benchmarks from:
. ht ml
/ mscf/hardware/results_hpcs2.html
http://www.spec.org/osg/cpu2000/results/cfp2000
IBM Corporation IBM eServer pSeries 690 Turbo (1300 MHz) 1 1202
Hewlett-Packard Comp hp workstation zx6000 (1000 MHz, Itanium 2) 1 1356
Also, go to: http://www.emsl.pnl.gov:2080/capabs/mscf/?/capabs
for benchmarks results for some real codes and further synthetic benchmarks.
Have you considered Itanium-2 under Linux for
your "number crunching" platform? The McKinley
(Itanium-2) is faster than the Power4, and also
cheaper (although you'll need to buy the Intel
compilers for a few hundred if you want great
performance).