Next-Gen Intel Chip Brings Big Gains For Floating-Point Apps
An anonymous reader writes "Tom's Hardware has published a lengthy article and a set of benchmarks on the new "Haswell" CPUs from Intel. It's just a performance preview, but it isn't just more of the same. While it's got the expected 10-15% faster for the same clock speed for integer applications, floating point applications are almost twice as a fast which might be important for digital imaging applications and scientific computing."
The serious performance increase has a few caveats: you have to use either AVX2 or FMA3, and then only in code that takes advantage of vectorization. Floating point operations using AVX or plain old SSE3 see more modest increases in performance (in line with integer performance increases).
Would that improve hashing speeds in, say, Bitcoin?
" Next-Gen Intel Chip Brings Big Gains For Floating-Point Apps "
How much of a gain? More or less than 0.00013572067699?
That does not sound like something that would benefit from faster floating point operations.
Certain kinds of apps will get a nice performance boost if they're running in house, or on a vendor managed server. If the customer installs the software, then no.
I hope there's really a new Mac Pro coming and that it has these chips in it! I do a heck of a lot of PDE solving, statistics and simulations, and would love to have a screamin' machine again.
For problems where you need floating point AND is not multithread friendly AND need large computing power AND is specially coded, then this will be of great use. However, most massive computing problems like this are multi-thread friendly and this will still be roughly an order of magnitude from the speeds you can get by using a GPU.
Slightly, but you haven't been keeping up on the latest hardware? My pair of Sapphire 5830's graphics cards would top off at about 435MH/s at a total system wattage of around 520W. The new Jalapeno chips from butterfly labs will do 4500 MH/s using 2 watts total system power. For comparison, my i5-2400 performed 14MH/s at 95W or so. So the Jalapeno is about 321x faster and about 47x more power efficient so combined, I believe that's 15,267.864x more efficient.
I'm gonna buy some i5 "watchacally" chips soon and I'll wait for the price to come down.
With tech, unless you need it NOW, wait because the price will always come down.
And I win again!
The thing that interests me most about this generation is the progress towards a single chip solution. Ultrabooks and tablets can get a multi chip package with the PCH (last remnant of the old chipset) soldered along the CPU/GPU die. Shouldn't take long till everything is fabbed onto one piece of silicon, reducing power requirements and gadget size.
While it's got the expected 10-15% faster for the same clock speed for integer applications, floating point applications are almost twice as a fast HTH
Integer and floating point are separately implemented in the hardware, so an improvement to one often doesn't apply to the other. You can add integers by counting on your fingers. To do that with floating point, you have to cut your fingers into fractions of fingers - a very different process.
See: http://en.wikipedia.org/wiki/FMA3
It's common to have an accumulator like this:
X = X + (Y * Z)
To compute that in floating points, the processor normally does:
A= ROUND(Y*Z) X=ROUND(X+A)
Each ROUND() is necessary because the processor only has 64 bits in which to store the endless digits after the decimal point. FMA can fuse the multiply and the add, getting rid of one rounding step, and the intermediate variable:
X= ROUND( X + (Y*Z) )
That makes it faster. Since integers don't get rounded to the available precision, the optimization doesn't apply to integers. The above processor would do Y*Z, then +X, then round, then X=. A CPU designer can make that faster by including either a "add and multiply" circuit or a "add and round" circuit or a "round and assign' circuit. Any set of operations can be done in two clock cycles, if the maker decides to include a hardware circuit for it.
could someone tell me how many separate instruction sets, pipelines and register files I
get in a mainline CPU these days? i turned away for a second and completely lost track.
what happens with the 10 that you aren't using? just sitting there reducing the yield?
Would that improve hashing speeds in, say, Bitcoin?
Bitcoin is based on SHA256 hashing, which has zero floating point operations. So no, this will not impact Bitcoin mining at all.
"Tom's Hardware has published a lengthy article and a set of benchmarks on the new "Haswell" CPUs from Intel.
Yes, but will it blend?
While speed for single and double floats is all well and good, I wonder - when will there finally be hardware support for 128 bit (quadruple precission) floats?
Link translated from the original Italian.
"As you see in the red bar, the task is finished much faster on Haswell. It’s close, but not quite 2x."
The RED bar is integer not floating point.
The serious [floating point] performance increase has a few caveats: you have to use either AVX2 or FMA3,
Isn't AVX2 just the integer version of AVX? Like SSE2 added integer versions of the SSE floating point instructions? If so, that sentence doesn't make sense.
Mada mada dane.
"As you see in the red bar, the task is finished much faster on Haswell. It’s close, but not quite 2x." Sorry to ruin it for everyone but the RED bar is integer not floating point.
Can the Jalapeno chips do anything else when the Bitcoin market crashes? At least with the video cards I cant still drive video cards with them.
"As you see in the red bar, the task is finished much faster on Haswell. It’s close, but not quite 2x." Sorry to ruin it for everyone but the RED bar is integer not floating point.
Calculate password hashes? Or collisions?
Pah. AMD had FMA4 since 2011
There new Thunder and durgango APUs are rumored to finally get close to the I7's!
This will crush them as AMD's former strength is floating point calculations and today it is multithreading or rather can get close to performance in multithreading.
http://saveie6.com/
To get Cyrsis 3 at 30 fps is here!
http://saveie6.com/
"hey kids, our CPU is twice as fast as the next guys!"*
*(you must rewrite your code to do twice as much stuff at once)
**(which has been true for like, 15 years ever since SSE + friends made it into the PC market)
***(which means developers have to spend time writing non-portable optimization code)
Intel's C/C++ and FORTRAN compilers are exceedingly efficient at vectorization, and are of course updated to use their new instructions. Does take a bit for software to be compiled using it, but you can see some real gains in a lot of things without special work.
I also think people who do GPGPU get a little over focused on it and think it is the solution to all problems. You find that some things like, say, graphics rendering, are extremely fast on the stream processors that make up a modern GPU. However you find other things not so much, they can even be slower. Intel CPUs are very good as mixed tasks, and the better vector units only make that more true.
AMD has lost the CPU race a long time ago, but still beats Intel with integrated graphics. Now, It looks like Haswell could win that battle too.
The article shows GT2 to be 15% - 50% faster than the old HD4000. That's still a bit slower than Trinity, but GT3 has double the execution units than GT2, potentially blowing anything away that AMD could offer.
No overclocking? Ermahgerd, that's a showstopper for those wanting to do HPC!!!!!!!!!!!!!!!!!!
Tubal-Cain smokes the white owl.
And will you still be using your outdated video cards when that time comes? Perhaps, perhaps not. Sure, it could theoretically still drive video, but if it's not being used anymore, what's the difference?
Bitcoins still hold no value to me. No one I deal with accepts them as currency, hence they hold no value.
I can't pay my taxes with bitcoins, I can't buy food, I can't repay my mortgage, I can't buy petrol. What can I do with a bitcoin?
I understand that a big question is whether the Butterfly Labs chips actually exist, let alone work.
Imagine employing someone as stupid as 'mozumder' in any mission critical situation. He is like the embodiment of the phrase "no-one ever gets fired for buying IBM". A brain-dead clod who is the joy of every shark selling any over-priced brand.
Correctness in calculation is a computer science and maths discipline. It is NEVER achieved EVER, EVER by relying on the accuracy of any given piece of hardware. Indeed, any company doing real critical work would immediately FIRE any cretin like 'mozumder' who stated "we can trust this hardware".
"I don't have to know how to do my job properly- that's why I bought a Mac Pro."
For those that wonder, it is impossible to build a perfectly reliable CPU, and one shouldn't even try. Instead, you build 'good enough' hardware, and use correctly composed software systems to compensate for statistically rare anomalies. ECC memory is largely a marketing gimmick. There are, sadly, hundreds of thousands of places where 'data' can become corrupt in a CPU. Most of this possible type of error cannot be feasibly detected by inbuilt hardware solutions. ECC is used simply because it is trivial to add to memory blocks- blocks that represent only the tiniest fragment of all possible logic errors.
The greatest vulnerability in a modern system is in serial data transports, where the transmission line is driven as fast as possible. However, error correction is always used on these interconnects to enable such high speeds. Ordinary logic is clocked at vastly less troubling speeds, so that the likelihood of failure is statistically very low indeed. Any hardware errors that do then happen can be considered as unavoidable- to be countered by proper software procedures.
Mission critical calculations MUST be subject to sanity tests. This may involve running the same calculation more than once- running different algorithms that should give the same result, using multiple computers, or calculating reasonable bounds for the expected results.
The idea that someone could say "Duh, I don't have to bother- we use a Mac Pro with ECC" is so terrifying, people expressing such opinions should probably be identified to ensure they aren't working somewhere where their idiocy and complete lack of maths skills may get someone killed.
Can I buy gold with bitcoins?
OMG yes!
when avx came out, it was supposed to be a major speedup..
guess what, lots of things are still faster in SSE2/3
many of the new registers appear to speed things up, but what isn't readily apparent is there haven't always been improvements in memory ports.
the major speedups are going to come from cleaning up the way instructions are handled and the memory lanes in the chip, not just throwing more registers at us
This guy (Agner Fog) is the best reference on the net for what's going on in these chips:
http://www.agner.org/optimize/blog/read.php?i=142
In the early 2000's we had some, every week one of them would crash. All the other servers w/ECC, no crash. Hardly a marketing gimmick.
You can sell them on the exchange quickly and easily for USD (or 5 other major currencies)
Reading your diatribe would lead the naive reader to believe Intel's processors' benchmarks are substantially inferior to AMD's. Now that's comedy.
Contribute to civilization: ari.aynrand.org/donate
They had officially classified it as a coffee warmer
So I can sell them for less than the cost of power to mine them? There's also the loss associated with amortising and depreciating the hardware required to mine them as well.
So if these ASICs are as good as claimed why sell them to the general population when they could simply mine their way to a tidy profit?
If you can do that, you'll revolutionize computing. No, doubling the clock to send two ticks to the gates doesn't count - the real clock is defined by the gate speed.
I can't pay my taxes with bitcoins, I can't buy food, I can't repay my mortgage, I can't buy petrol. What can I do with a bitcoin?
You can send them to me...
I don't care if it's 90,000 hectares. That lake was not my doing.
Who are you to define what counts and what doesn't?
Will gcc use AVX or FMA3 if I write normal code in C++? How about Java and Python / numpy, could it be that python actually gets faster than C++ if gcc doesn't take advantage of these technologies?
It is interesting how every mention of Bitcoin attracts people saying how they're worthless, useless, or a scam that's about to collapse any second now. It's interesting, because people don't usually spend this much time hating something that wouldn't affect them in any way even if they were right. It's almost starting to seem like a FUD campaign, which leads to a question: who is behind it, the banks, the government, Visa or PayPal?
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
At least with the video cards I cant still drive video cards with them.
You can't drive video cards with video cards? Dawg, I heard you like video cards...
So if these ASICs are as good as claimed why sell them to the general population when they could simply mine their way to a tidy profit?
So if those pickaxes are as good as claimed, why sell them to the general population when they could simply mine their way to a tidy profit?
Sounds like someone is....butthurt
The purple unicorns are behind it.
combined, I believe that's 15,267.864x more efficient.
I guessing you're not an engineer. Or a mathematician. Something in marketing, perhaps?
OP is making a trivial observation about pipeline length versus clock speed which any EE would understand. There's no revolution brewing, except perhaps in your understanding if you think about this for a while.
When bitcoins hit $3.60 ea and the difficulty was about 1/3 what it is now, I was spending $42 on electricity to get around $45 in BTC. Now the price is $47/BTC and it takes 1/250th the power to generate them 10x as fast but at 3x harder difficulty. Still a hell of a net gain.
Half the bitcoins that will ever be mined have already been mined. If this was to ever be widespread, how would more than 21 million people be able to take part? That's only 0.3% of the world. Less than 10% of USA. Less than one coin per Australian (there's 22 million of those buggers)
As soon as you start using fractions of coins, you're introducing traditional banks in to the picture. Single points of failure to what used to be a distributed system.
Scams and fraud shouldn't be too hard either. If you hijack the local wifi spot while you're trading a coin with someone and you control all the peers accessible in that spot, who's to say the coin really changed hands? remove knowledge from the peers of the transaction and no one will know about it until that other guy gets another connection. You could trade the same coin many times.
Sometimes, they laugh at you because you really are Bozo the Clown.