Slashdot Mirror


Smarter Thread Scheduling Improves AMD Bulldozer Performance

crookedvulture writes "The initial reviews of the first Bulldozer-based FX processors have revealed the chips to be notably slower than their Intel counterparts. Part of the reason is the module-based nature of AMD's new architecture, which requires more intelligent thread scheduling to extract optimum performance. This article takes a closer look at how tweaking Windows 7's thread scheduling can improve Bulldozer's performance by 10-20%. As with Intel's Hyper-Threading tech, Bulldozer performs better when resource sharing is kept to a minimum and workloads are spread across multiple modules rather than the multiple cores within them."

196 comments

  1. So... by Anonymous Coward · · Score: 0

    Bulldozer sucks at multitasking, but it's great if programmers utilize parallel programming techniques (which they don't use right now anyway--multicore processors are pretty much explicitly for improving multitasking performance due to this).

    1. Re:So... by laffer1 · · Score: 1

      You mean integer based instructions. Floating point is still not as good with the AMD chips (unless using the new instructions)

    2. Re:So... by EdZ · · Score: 1

      Worse, what this shows is that AMD's idea that you only need one FPU for every two integer units (how Bulldozer is laid out) results in a 20% performance drop.

    3. Re:So... by beelsebob · · Score: 1

      The idiocy here is that they've not succeeded in making bulldozer faster, they've succeeded in making one very specific benchmark run faster with very specific scheduler settings for that exact one benchmark. Give it some different code to run and this'll degrate performance.

    4. Re:So... by makomk · · Score: 2

      In theory it actually has the equivalent of an 128-bit wide FPU for every integer unit. Though I hear rumours that they may have not put as much effort into making the classic x87 FPU instructions run fast and that harmed them in some of the non-SSE-supporting benchmarks that a lot of the reviews used.

    5. Re:So... by Daniel+Phillips · · Score: 0

      Wow, Intel fanbois are out in force.

      --
      Have you got your LWN subscription yet?
    6. Re:So... by Calos · · Score: 1

      I'm sure there will be plenty of fanbois in this discussion, but this is the person you chose to call out?

      He seems to be more or less right. This kind of scheduling might help many processor-intensive tasks, but this kind of scheduling isn't available to the majority of software, and it's obvious that Windows isn't smart enough to do it either. Unless AMD gets Windows support, or BIOS trickery as mentioned at the end of the article... these chips will be under-utilized.

      But no, you come here to say, in essence, "people with contrary opinions suck," no matter how reasonable they may be. It is you who reveals yourself to be a fanboy, if that's all you have to add.

      --
      I vote based on politicians' actions, unless contrary to my preconceptions. Often wrong, never uncertain. #iamthe99%
    7. Re:So... by Daniel+Phillips · · Score: 1

      I'm sure there will be plenty of fanbois in this discussion, but this is the person you chose to call out?

      Half truths are the most insidious kind. So somebody discovers that Bulldozer likes a certain kind of scheduling and runs faster with it. That is not "one benchmark", that is an interesting optimization technique. Completely fair, and nobody can claim that Intel fails to benefit from optimizations directed at their exact architecture as well.

      --
      Have you got your LWN subscription yet?
    8. Re:So... by beelsebob · · Score: 1

      The problem being that what you just said is a half truth – what they discovered is not that Bulldozer likes a certain kind of scheduling and runs faster with it. Instead what they discovered is that when running exactly one benchmark, Bulldozer likes a certain kind of scheduling and runs faster with it. This says nothing about the affect on other benchmarks, and, windows devs not being stupid (though many would argue they are), I'm quite sure that in speeding one thing up, they'll have slowed several others down.

  2. no one got fired buying intel by alen · · Score: 1

    that's the truth. unless i can buy an AMD server for a lot cheaper i'm not going to try and take on the risk of performance issues

    1. Re:no one got fired buying intel by h4rr4r · · Score: 1

      Depends on what you mean by a lot cheaper. If you need lots of cores but don't need them fast, like for a VM host then AMD servers can be quite a bit cheaper once we are talking about getting 128GB+ of RAM.

      Risk of performance issues makes no sense if you don't know what app you want to run.

    2. Re:no one got fired buying intel by Antisyzygy · · Score: 3, Informative

      AMD servers are way cheaper, and there are no performance issues most admins can't handle. What do you mean by performance? If you mean slower, then yes, but if you mean reliability than they are about the same. Why else do Universities almost exclusively use AMD processors in their clusters for cutting edge research? I can see your point if you are only buying 1-3 servers but you start saving shitloads of money when its a server farm.

      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
    3. Re:no one got fired buying intel by Anonymous Coward · · Score: 1

      With the issues we have encountered in the past with Intel's microcode updates I really would not mind switching over to AMD... For most web- and database servers the cpu performance really does not matter much unless you have an abundance of ssl connections to it - and even then the difference between both manufacturers is hardly worth mentioning marginal. You just have to make sure everything is tuned to the underlying system - if you don't know how to do that you're in the wrong business.

    4. Re:no one got fired buying intel by QuantumRiff · · Score: 3, Informative

      A dell R815 with 2 twelve-core AMD processors (although they were not bulldozer ones) 256GB of ram, and a pair of hard drives was $8k cheaper than a similarly configured Dell R810 with 2 10-core Intel Processors when we ordered a few weeks ago. That difference in price is enough to buy a nice Fusion-IO drive, which will make much, much more of a performance impact than a small percentage higher CPU speed

      --

      What are we going to do tonight Brain?
    5. Re:no one got fired buying intel by KhazadDum · · Score: 2

      Agreed. To further expound upon parent's point, unless you really know your performance needs and requirements, where the initial extra cost of Intel chips is lower than the revenue that is gained with that extra couple percent of performance, then go Intel. Otherwise, it's usually a cost versus preference piss fest. And last I checked in a down economy, cost is king.

    6. Re:no one got fired buying intel by 0123456 · · Score: 1

      Clearly AMD should be charging $4k more for their CPUs if they're leaving that big a gap between their price and Intel's.

    7. Re:no one got fired buying intel by Surt · · Score: 1

      They're fighting reputation. If it was $4k more, they would probably lose too many sales to make up the price difference.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    8. Re:no one got fired buying intel by Anonymous Coward · · Score: 0

      They would but they have to deal with morons spouting this "no one ever got fired buying Intel" bullshit. If you aren't bright enough to evaluate your requirements and determine appropriate price/performance you can always go with the status quo and say "well everybody else is doing it so it can't be that bad a decision".

    9. Re:no one got fired buying intel by nabsltd · · Score: 1

      A dell R815 with 2 twelve-core AMD processors (although they were not bulldozer ones) 256GB of ram, and a pair of hard drives was $8k cheaper than a similarly configured Dell R810 with 2 10-core Intel Processors when we ordered a few weeks ago.

      The Westmere-EX CPUs on the Dell R810 are recently released, and as such are very pricey. They are also much, much faster than any other Intel or AMD chip on a per-clock basis. Because the E7-88xx Xeons have nearly twice the cache (30MB "smart" vs. 24MB total L2 plus L3), are hyper-threaded, and run faster clock-for-clock, a heavily parallel task will likely finish faster on a single CPU Westmere-EX than on a dual CPU Magny-Cours.

      Because of this, the R810 is a much, much more powerful system than the R815, so it only makes sense that it's more expensive, although part of it is paying for the bleeding edge of Intel. In the more normal realm, you can get a pair of 2.4GHz 6-core E5645s for less than the price of a single 2.2GHz Opteron 6174. That's 12 cores and 24 threads vs. 12 cores, and overall more performance.

    10. Re:no one got fired buying intel by Kjella · · Score: 4, Interesting

      Well, it doesn't seem to apply when you get up to supercomputing levels at least. I checked the TOP500 list and it's 76% Intel, 13% AMD. As for Bulldozer, it has serious performance/watt issues even though the performance/price ratio isn't all that bad for a server. On the desktop, Intel hasn't even bothered to make a response except to quietly add a 2700K to their pricing table, with the 2600K left untouched. On the business side (where after all margins fund future R&D) then Sandy Bridge's 216mm2 is much smaller than Bulldozer's 315mm2. Intel can produce almost 50% more in the same die area, in practice the yields probably favor Intel more because the risk of critical defects go up with size. Honestly, I don't think Intel has felt less challenged since the AMD K5 days...

      --
      Live today, because you never know what tomorrow brings
    11. Re:no one got fired buying intel by Zorpheus · · Score: 1

      Maybe their reputation would be better if their processors would cost the same.
      Some people just think that something must be worse when it is cheaper.

    12. Re:no one got fired buying intel by billcopc · · Score: 1

      Those kinds of people are very vulnerable to an optimistic young techie destroying their rep as a purchaser, or so my last two years of sales would suggest. I displaced someone who would only buy "the best", which in his view meant something 5x more expensive, and where every tech dispatch was accompanied by a sales guy, to work the purchaser while techie was busy installing the goods.

      If AMD can deliver better performance per $ and per watt in the server room, I'll consider them, and so will my clients if it improves their bottom line.

      --
      -Billco, Fnarg.com
    13. Re:no one got fired buying intel by Antisyzygy · · Score: 1

      When you can save 8000 per server then invest it in something else it becomes a different issue. I am not trying to say AMD processors are superior, I am just saying factoring in all costs including power and and the lifespan of the unit, AMD wins a lot of the time. Every computer cluster at every University I have ever had access to used AMD processors (with the exception of some NVidia units), and this was for their CS departments. I suspect part of the issue is its easier to justify power budgets and not as easy to justify 8000 more per server to upper admins. Figuring you could buy 1.5 AMD servers for the price of 1 Intel server you end up with a more cost effective computer as far as total CPU performance and RAM capacity goes. Power consumption is not one of AMD's strong suits, and I remember one of our server admins told me the power bill once for the main cluster, it was sickening. I vaguely remember it being in the hundreds of thousands per year. It saddens me that AMD is in this situation, but I seem to remember a time where Intel was pulling some pretty anti-competitive moves, though AMD should have capitalized on its successes in the past. I seem to remember, at least for the desktop environment, the Athlon XP's had better gaming performance. I suppose thats a small market, however even that was an opportunity that could have been exploited better.

      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
    14. Re:no one got fired buying intel by Antisyzygy · · Score: 1

      by cost-effective computer I meant cluster! Also, I would like to add I had high hopes for the bulldozer, so it was disappointing it was all marketing hype.

      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
    15. Re:no one got fired buying intel by bill_mcgonigle · · Score: 1

      Fast memory bus, nothing special needed to use ECC RAM, good work/watt, and low prices all help win AMD for most clusters.

      If you're aiming for a Top-500 slot and you have server money but not real estate money, then Intel is the logical choice.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    16. Re:no one got fired buying intel by yuhong · · Score: 1

      Yes, but I have wondered for a while what will happen to the quad-socket market if AMD sticks to the same pricing policy with Interlagos. Remember that Intel is one generation behind with Westmere-EX, and Sandy Bridge-EP is not even released yet right now.

    17. Re:no one got fired buying intel by afidel · · Score: 1

      Apples to apples they cost difference between an R810 and R815 should be on the order of $200, not $8,000.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    18. Re:no one got fired buying intel by yuhong · · Score: 1

      And remember that Interlagos will be drop-in replacement for Magny-Cours.

    19. Re:no one got fired buying intel by the+linux+geek · · Score: 1

      A 10-core Westmere-EX vs a 12-core Magny-Cours is much more than a "small percentage higher" - probably 30 or 40%, potentially higher depending on workload.

    20. Re:no one got fired buying intel by blair1q · · Score: 1

      >Why else do Universities almost exclusively use AMD processors in their clusters

      Because when your budget is fixed and N is the number of nodes you can afford and M is the performance per node, and N1*M1 > N2*M2, you buy P1 over P2 even if M1 > N2 in this case because proprietor 1 has a lot of trouble selling its units to individuals and turns to massively discounting its products when sold in bulk to HPC OEMs.

    21. Re:no one got fired buying intel by blair1q · · Score: 1

      I highly doubt the price difference was because of the processors. More likely it was because Dell is having trouble moving those boxes because they're slower.

    22. Re:no one got fired buying intel by Idbar · · Score: 1

      Why else do Universities almost exclusively use AMD processors in their clusters for cutting edge research

      [citation needed]

      Not that I question your argument, but I want to see you backing up your claims. Last time I checked, that was not the case.

    23. Re:no one got fired buying intel by yuhong · · Score: 2

      And slower will be I think solved with Interlagos, and Intel will have only Westmere-EX (Xeon E7) to compete since Sandy Bridge-EP is not even released yet. Now compare the already-released pricing of Opteron 6200 CPUs with Intel's current Xeon 7500/E7 pricing, and guess what will happen.

    24. Re:no one got fired buying intel by yuhong · · Score: 1

      And AMD has also be trotting the death of the 4P tax on their blogs in the Opteron 6100 era, and there is no indication they will going to change that with Opteron 6200 anyway.

    25. Re:no one got fired buying intel by Antisyzygy · · Score: 1

      Yes, as I said, its cheaper. I did not mention performance per node but I did in a later post.

      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
    26. Re:no one got fired buying intel by Antisyzygy · · Score: 1

      Thats a tough one to find a citation for. Essentially, at the three universities I have worked and two of them I have collaborated with they had between 6/10 and 8/10 clusters running AMD processors. Some had NVidia clusters as well. Some had ones running Intel but they were older systems or were in use by departments other than Math/CS. This evidence may be anecdotal, but two of the universities are larger ones with large research budgets. Exclusive was a bit of a exaggeration in hind sight.

      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
    27. Re:no one got fired buying intel by Idbar · · Score: 1

      As you said, it's probably a market strategy. Every vendor focuses on certain companies/universities to sell their products. The one I worked for (what I guess is a 2nd-tier ranked university) received a lot of discounts depending on the vendors, and used to get lots of Intel based processor CPUs. Perhaps AMD is targeting more aggressively Tier 1 universities, while others take a wider range. That's why I asked.

      Of course many are interested in seeing their products advertised at top universities, while others are covering a wider spectrum of customers. I'm wondering if anyone else here has more insight.

    28. Re:no one got fired buying intel by blair1q · · Score: 1

      Cray fell for AMD in a big way a few years ago. Then AMD handed them the Barcelona grenade and, well, they fell out.

      Cray now sells Intel-based HPCs and does quite well with them.

      AMD is the budget choice for your desktop or your server farm. People still find reasons to justify spending a little more for the Intel systems, though. I suspect that there are hidden performance and reliability costs to owning AMD in a scaled-up context.

    29. Re:no one got fired buying intel by Kjella · · Score: 1

      I suspect part of the issue is its easier to justify power budgets and not as easy to justify 8000 more per server to upper admins.

      Well, it's not that much alone but it's another thing added to the total cost that lowers the price AMD can charge. A Bulldozer uses about 70W more than a 2600K, that's about 600kWh/year. Say 5 years lifetime and 10 cent/kWh and that is $300 more per CPU to operate it, not counting scaling up the power supply or the air conditioning. Compared to $8000 that's little, but as part of the profit margin AMD could have had it's probably a lot. Of course a 2600K wouldn't be in direct competition on the server anyway, but I'm to lazy to look up the appropriate Xeon.

      --
      Live today, because you never know what tomorrow brings
    30. Re:no one got fired buying intel by Antisyzygy · · Score: 1

      The only reason I suspect its easier is that its easier to convince an admin in control of budget later that "HOLY SHIT! We don't have enough money in our budget to pay for the power our servers need to continue to operate and if we lose the servers we won't attract massive research grants!!! LOL" than, "Hey, why don't we try to spend more and get a higher performance and lower power consumption per CPU?" Of course there are other problems, such as the fact that it looks better when you purchase X number of nodes at Y performance for less than P number of nodes at Y performance when you leave power consumption out and X>P. The University system is just as rife with politics and bureaucratic nonsense as any other place, probably even worse actually.

      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
    31. Re:no one got fired buying intel by petermgreen · · Score: 1

      When you can save 8000 per server

      That sounds like extreme hyperbole to me. Care to show me a case where an Intel server costs that much more than an AMD one with equivalent computing power.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    32. Re:no one got fired buying intel by petermgreen · · Score: 1

      If he is REALLY spending $8000 more on an intel server than an AMD one then either he is getting MASSIVELY ripped off, comparing servers that are radically different in other ways or comparing servers with more than 2 sockets.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    33. Re:no one got fired buying intel by Antisyzygy · · Score: 1

      Wait to make the problem impossible due to arbitrary and unrealistic expectations. Approximately 1.5 AMD servers will have the same computing power and cost about the same, except they will have more memory and throughput. Power consumption is a different issue, but then I digress.

      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
    34. Re:no one got fired buying intel by Antisyzygy · · Score: 1
      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
    35. Re:no one got fired buying intel by Antisyzygy · · Score: 1

      You may be right. Since I don't have concrete evidence here, I will admit that I simply was wrong about saying exclusive. I have been at 2 top tier universities and as such my sample is not large enough to say anything about it in regard to universities in general. I suspect universities buy AMD because its a higher cost/performance per cluster if you don't factor power consumption (nor anything else) into your metric.

      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
    36. Re:no one got fired buying intel by Antisyzygy · · Score: 1

      Sorry, meant higher performance per cluster / cost.

      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
    37. Re:no one got fired buying intel by Antisyzygy · · Score: 1
      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
    38. Re:no one got fired buying intel by Antisyzygy · · Score: 1

      Most likely power consumption due to the power needed for each cpu and cooling requirements (which affect the space needed and infrastructure needed to store it all). I don't think an AMD cluster is less reliable over an Intel one in ideal physical conditions for the AMD processors on each. What I mean by that is, sufficient cooling and power supply for the AMD cluster since they are known to be more power hungry and require more cooling.

      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
    39. Re:no one got fired buying intel by hitmark · · Score: 1

      Reminds me of the Torvalds claim that going from HDD to SSD may be the biggest bang-pr-buck upgrade one can do these days.

      --
      comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
    40. Re:no one got fired buying intel by Anonymous Coward · · Score: 0

      And slower will be I think solved with Interlagos, and Intel will have only Westmere-EX (Xeon E7) to compete since Sandy Bridge-EP is not even released yet. Now compare the already-released pricing of Opteron 6200 CPUs with Intel's current Xeon 7500/E7 pricing, and guess what will happen.

      Here's a link to the original report of the pricing info, which has it in a nice table. My eyes tried to escape from their sockets while reading the horrible, illiterate softpedia copycat article.

      http://www.cpu-world.com/news_2011/2011100701_Pre-order_prices_of_AMD_Opteron_3200_4200_and_6200_processors.html

      I'm not sure it's going to be as good as you imply for AMD. The best 16C Interlagos is going to be 2.6 GHz @ 140W TDP. Like Magny-Cours before it, most Interlagos CPUs will ship in 2-socket servers (4S is very low volume, 2S is mainstream for rackmount servers). The low frequency and poor performance to clockspeed ratio of BD are probably going to allow the 3.x GHz 6-core Westmere-EP models to compete directly with Interlagos in the 2S market.

      That might sound farfetched at first, but remember that client bulldozer (FX-8150) needs 4 modules (which AMD is calling 8 cores for marketing purposes) at 3.6 GHz to match/beat 3.4 GHz 4-core SB in a handful of embarrassingly parallel integer benchmarks, while losing badly in others. SB is faster per clock than Westmere, so I'm going to guess that it would take a 3.6 GHz 4-core Westmere to match FX-8150, i.e. 1 Hz of 1 BD module is about equal to 1 Hz of 1 Westmere core.

      With a solid foundation of SWAGs in place, consider that 3.46 GHz * 6 cores = 20.76, and 2.6 GHz * 8 modules = 20.8. While this is an admittedly very back-of-the-envelope estimation technique, IMO it's plausible that a 3.46 6C Westmere can probably keep pace with a 2.6 8M/16C Interlagos in very parallel code. (Any serial code should favor Westmere-EP by a huge margin, of course.)

      IMO, AMD is pricing Interlagos CPUs to compete against Westmere-EP. AMD doesn't really have anything which can directly compete against -EX anyways (nothing AMD makes has RAS features comparable to -EX family CPUs). Also, given the slow rollout of Interlagos AMD might not have a very wide window before SB-EP is out on the market, and It seems likely that SB-EP will really put the hurt on Interlagos.

    41. Re:no one got fired buying intel by yuhong · · Score: 1

      I am talking about the quad socket market only. I know that the dual socket market will be a different story because of Sandy Bridge-EP already.

    42. Re:no one got fired buying intel by Anonymous Coward · · Score: 0

      Lol, that's not an apple's to apple's comparison. The 20 core Intel machine will _smoke_, _badly_, the 24 core AMD machine in very damn near any load you throw at it.

  3. So basically... by Anonymous Coward · · Score: 1, Insightful

    So basically they suck. I shouldn't need to tweak my os thread scheduler just so a cpu can suck less. AMD needs to fix their shit instead of lame excuses.

    1. Re:So basically... by ackthpt · · Score: 1

      So basically they suck. I shouldn't need to tweak my os thread scheduler just so a cpu can suck less. AMD needs to fix their shit instead of lame excuses.

      It's good for a low end multi core, but after a lot of research I've decided to go with the proven Phenom II processor.

      --

      A feeling of having made the same mistake before: Deja Foobar
    2. Re:So basically... by h4rr4r · · Score: 2

      So then SSDs suck because you have to tweak the IO scheduler(elevator)?

    3. Re:So basically... by X0563511 · · Score: 1

      Because "Yea! Fuck progress!" - is that what I'm hearing?

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    4. Re:So basically... by Anonymous Coward · · Score: 1

      Yes. I as a user should not have to make esoteric workarounds for the lousy performance of your product. Especially when even with the tweaks it is only marginally less crappy but still sucks more than the competition or even your own competing product line that is cheaper. The Phenom II x6s can blow away the fx-8150 at half the price point

    5. Re:So basically... by HarrySquatter · · Score: 1

      Slower performance and higher tdp equals progress?

    6. Re:So basically... by Sloppy · · Score: 1

      So basically they suck. I shouldn't need to tweak my os thread scheduler just so a cpu can suck less.

      You must think the i3 and i7 suck too, then, since they have hyperthreading in addition to their multiple cores, and definitely benefit schedulers being HT-aware. Actually, you probably think all multicore CPUs and SMP motherboards suck, since before those were widely available, the kernels in use at the time didn't know how to use more than one CPU.

      AMD needs to fix their shit instead of lame excuses.

      Can't argue with that; Bulldozer's performance isn't as much as everyone was hoping it would be.

      I think what's really gone wrong with the design is that in addition to the nifty approach to integer parallelism (which I still think was a great idea and makes the chips better than they would be without it) they also decided to do the longer-pipeline thing. And it would have worked, if they shipped the new CPUs with an extra GigaHertz or two of clockspeed. But they didn't. Probably for the same reason Intel gave up on the same idea after the P4.

      I really hope that mistake doesn't end up killing them. They have got to either get the clockspeed up, or else lower their prices/profits further.

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    7. Re:So basically... by HarrySquatter · · Score: 1

      But does the end user have to do esoteric tweaks themselves for an Intel processor with hyperthreading? Nope.

    8. Re:So basically... by h4rr4r · · Score: 1

      Tuning is a normal part of setting up a machine. If you don't want to do any tuning Dell will be happy to do it for you.

      The Phenom 2 is probably what you should then buy.

    9. Re:So basically... by Anonymous Coward · · Score: 0

      So I should make a shittier universal product on the assumption that your shitty software will never get fixed?

    10. Re:So basically... by dpilot · · Score: 1

      No, what it means is that the software hasn't caught up to the hardware, yet. Until compilers and kernels/schedulers have time to react to Booledozer, we won't see what it's truly capable of. Since you're not interested in tracking such stuff, buy something more mainstream.

      The interesting thing here is the lame excuses. Not that long ago, Intel managed to (nearly) simultaneously introduce both NetBurst and Itanium. AMD never would have survived such a debacle - there's serious question about whether they'll survive Booledozer, which hasn't yet gotten its chance with a proper compiler and scheduler. Yet Intel not only survived that disastrous dual introduction, they used their power an money to deny AMD's K8 the degree of business success it deserved to match its technical success.

      --
      The living have better things to do than to continue hating the dead.
    11. Re:So basically... by fuzzyfuzzyfungus · · Score: 3, Interesting

      So basically they suck. I shouldn't need to tweak my os thread scheduler just so a cpu can suck less. AMD needs to fix their shit instead of lame excuses.

      I've got some very bad news for you: While I have no particular knowledge of, or interest in, today's architecture pissing match, the days when the OS was allowed to ignore architectural details and expect things to just work optimally are good and over(if they ever existed in the first place).

      Dynamic processor clocks? Why should I have to deal with some performance governor shit when Intel can just make a CPU that either uses almost no power at 3GHz or runs like a bat out of hell at 800MHz? Oh, because they actually can't. Sorry. Multiple cores? WTF? Why do they expect me to program in parallel for 2 3GHz cores instead of just giving me a 6GHz core? Oh, because they actually can't. Sorry. NUMA? Memory access times already blow! Now you want to make them unpredictable? Well, we can either repeal the speed of light and restrict every system to a single memory controller or deal with nonuniform access times and cry into our 128GB of RAM... The list just goes on. Hyperthreading can provide anything from less than zero improvement, if it increases contention for resources that were already being fully used, to fairly substantial improvement, if the CPU was being starved at times under a single thread. Now the Bulldozer cores have implemented something between full multi-core(with 100% duplication of resources per core) and hyperthreading(with virtually zero additional resources for the HT 'core'). Shockingly, performance depends on whether the two semi-independent cores are stepping on one another's shared toes or not...

      Even if, in this specific instance, AMD happens to have fucked up and made the wrong architectural choice, that doesn't change the fact that you can't escape architectural oddities unless you are willing to stay quite far from the forefront of performance, or deal with some sort of hardware/firmware abstraction layer that ends up being at least as complex as the OS-level hackery would have been, but more likely to be vendor specific and have its cost spread across far fewer units. It certainly isn't the case that all architectural deviations are good, some are ghastly hacks best forgotten, some are perfectly OK ideas dragged down by products that overall aren't much good; but the path of progress has been liberally sprinkled with oddities that have to be accounted for somewhere in the overall stack.

    12. Re:So basically... by h4rr4r · · Score: 1

      The system builder did when they first came out.

      The user, buys his machines off the shelf at Bestbuy.

    13. Re:So basically... by fuzzyfuzzyfungus · · Score: 2

      So then SSDs suck because you have to tweak the IO scheduler(elevator)?

      How can you even Dream of trusting any drive that isn't good enough for solid, proven, CHS addressing?

    14. Re:So basically... by DarkOx · · Score: 1

      Yea, its not like its the operating systems job to abstract the hardware, and coordinate resource sharing.

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
    15. Re:So basically... by HarrySquatter · · Score: 1

      Tuning the thread scheduler is not normal for 99% of users. This is a lame excuse by amd for a cpu core that will be megafail. Ivy bridge will make it look even more pathetic.

    16. Re:So basically... by Surt · · Score: 1

      Because intel has the leverage to get those tweaks into windows.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    17. Re:So basically... by h4rr4r · · Score: 1

      Users don't buy CPUs, the system builder will do this for you.

      This is a pretty bad release out of AMD, lets hope they survive it.

    18. Re:So basically... by Anonymous Coward · · Score: 3, Interesting

      You did when it was initially launched. Windows 2000's scheduler does not cope well with hyperthreading /at all/ by default. You saw similar things when dual core CPUs were launched. Now hyper threading and multicore are standard and OSs are aware of these cases.

      It's already been pointed out that windows 8's scheduler is bulldozer aware and performs much better than windows 7. I would not be surprised to see a patch from Microsoft that specifically addresses scheduler performance improvements for bulldozer CPUs. We've seen similar things in the past.

      By the way I'm seeing this unsusual phrase "Esoteric Tweaking" showing up a lot out of nowhere. It smells of astroturf. Could intel be affraid?

      Could it bet that bulldozer architecture, with its uneven fpu-integer core ratio, be the key to significant future scaling above and beyond what 1:1 can offer?

    19. Re:So basically... by Runaway1956 · · Score: 3, Funny

      "User". That summarizes half of the nonsense being posted here. This is a techie forum, isn't it? Techies tweak when no tweaking is needed. If you're a "user", then you're not even authorized to be in a server room. GTFO a STAY OUT!

      (listens for door slamming as the dweeb runs out)

      I just hate it when children blurt out their juvenile bullshit, interrupting the adults. Happens all the time . . .

      --
      "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
    20. Re:So basically... by Anonymous Coward · · Score: 0

      Nice portmanteau, Booledozer - the two-state 1-bit processor.

    21. Re:So basically... by DeadCatX2 · · Score: 2

      Uh...what? Users don't have to do anything to the scheduler. That's the responsibility of the operating system. A Service Pack will be released and you won't have to do shit, so your argument is moot.

      Besides, if your argument is "We shouldn't have to optimize schedulers", then you're a little late, because schedulers are most definitely optimized for their associated hardware

      --
      :(){ :|:& };:
    22. Re:So basically... by washu_k · · Score: 3, Informative

      No, It's because AMD is lying to the OS. The "8 core" BD is not really 8, core. It only has 4 cores with some duplicated integer resources. Basically a better version of hyper-threading, but not a proper 8 core design.

      The problem is that the BD says to Windows "I have 8 cores" and thus Windows schedules assuming that is true. If BD said "I have 4 cores with 8 threads" then Windows would schedule it just like it does with Intel CPUs and performance would improve just like in the FA.

      There shouldn't need to be any OS level tweaks because Windows already knows how to schedule for hyper-threading optimally. If BD reported it's true core count properly then no OS level changes would be needed.

    23. Re:So basically... by Sable+Drakon · · Score: 1

      Windows also comes with this HT awareness out of the box since Vista. AMD has quite simply screwed themselves with Bulldozer. Promising massive gains, enough to shame Intel. Yet the reality is an abysmal one where not only does Intel still have the performance edge, but where the previous product offers better performance for even less cost of the newer hardware. AMD has failed, plain and simple.

      --
      The Amarri pray for god, the Caldari pray for profit. the Gallente pray for peace, but the Minmatar pray their ships hol
    24. Re:So basically... by DeadCatX2 · · Score: 1

      Will the end user have to do esoteric tweaks after the next Service Pack for Windows? Nope.

      --
      :(){ :|:& };:
    25. Re:So basically... by h4rr4r · · Score: 1

      Those CPUs existed before Vista.

    26. Re:So basically... by beelsebob · · Score: 1

      It's not only that – they tweaked the scheduler to make one very specific benchmark perform well. Now run a different benchmark, I bet this will degrade performance.

      Not only that, but I bet we could play the same trick on $intel_chip with enough fiddling with settings.

    27. Re:So basically... by Anonymous Coward · · Score: 0

      In a nutshell, yes.

      Think back to when Intel first released the P4 chips. Almost everything out of the box ran worse than the highest clocked PIII at the time (1.13 GHz, if memory serves correct) even though the P4 had significantly higher clock speed (1.6 GHz on the initial offering, if memory serves correct). What releasing these to the public did was able Intel to iron out some of the stepping problems, fabrication problems and give the compiler writers time to incorporate the newest architecture improvements (like SSE2).

      Everything AMD is going through sounds exactly like the PIII to P4. After 6 months to 1 year, the process will be significantly more mature and the Bulldozer chips will be serious contenders to Intel offerings.

    28. Re:So basically... by Sable+Drakon · · Score: 1

      I'm aware of that, but XP wasn't HT aware right out of the box. The Prescot P4s were released after XP's launch and it's first service pack, even with SPs 2 and 3 that awareness was never added in. Vista was the first consumer version of Windows to incorperate it.

      --
      The Amarri pray for god, the Caldari pray for profit. the Gallente pray for peace, but the Minmatar pray their ships hol
    29. Re:So basically... by Anonymous Coward · · Score: 0

      I maybe completely off base, but the impression I've got from this and previous articles is that Microsoft has -ALREADY- tweaked the Windows thread scheduler for Intel's Hyper-Threading tech, and this is now only a matter of detecting Bulldozer and doing similar things for it. And I have to wonder if most of the performance gains will be made by essentially doing the -same things- (such as not putting two high loads on the same core when other cores are idle).

    30. Re:So basically... by 0123456 · · Score: 1

      And I have to wonder if most of the performance gains will be made by essentially doing the -same things- (such as not putting two high loads on the same core when other cores are idle).

      From the article it would appear that in other cases you'll reduce performance because that will disable 'turbo' overclocking. But the whole thing just seems too complex to optimise for because of all the special cases (e.g. don't put two integer threads on different cores, don't put two floating point threads on the same core), so that may be the best compromise.

    31. Re:So basically... by 0123456 · · Score: 1

      The oriignal hyperthreading P4s were pretty much irrelevant because they were single core; the OS either scheduled one thread or two based on whether hyperhreading was enabled in the BIOS, and there was nothing more complex required than that.

    32. Re:So basically... by turgid · · Score: 3, Insightful

      Unfortunately, the Wintel world has thrived on this philosophy for 20 years.

    33. Re:So basically... by 0123456 · · Score: 2

      After 6 months to 1 year, the process will be significantly more mature and the Bulldozer chips will be serious contenders to Intel offerings.

      AMD just have to survive six months to a year of selling poorly-performing CPUs that have twice as many transistors as the competition.

    34. Re:So basically... by Kjella · · Score: 3, Interesting

      There shouldn't need to be any OS level tweaks because Windows already knows how to schedule for hyper-threading optimally. If BD reported it's true core count properly then no OS level changes would be needed.

      Except that hyperthreading quite obviously has one fast thread and one slow thread filling the gaps. In AMDs solution both cores in a module are equal, but they share some resources. To use a car analogy the Intel solution is a one-lane road with pullouts where the hyperthread sneaks from one pullout to the other while there's no traffic while the AMD solution is a two-lane road with one lane chokepoints. Both sorta allow cars to travel simultaneously, but I don't think the optimization would be the same.

      --
      Live today, because you never know what tomorrow brings
    35. Re:So basically... by Chris+Burke · · Score: 1

      Will the end user have to do esoteric tweaks after the next Service Pack for Windows? Nope.

      Maybe they're saying that running Windows Update is an esoteric tweak?

      I guess they should pay the teenager next door to do it for them, and then clear off all the spyware they have from running an unpatched OS.

      --

      The enemies of Democracy are
    36. Re:So basically... by billcopc · · Score: 1

      There is a difference between a CPU upgrade and an SSD, which is not a hard drive at all and thus exhibits completely different performance characteristics. SSDs are a radical departure from the norm. A multi-core CPU is not.

      I don't claim to know how CPU design works, but surely they must have ways to study or simulate real-world performance before the product is finalized and placed on shipping pallets. Windows' scheduler "sucks" ? Funny, it works fine with all the other Intel and AMD systems, even chunky ones like my 12-core SMP rig... Maybe AMD should have tweaked the chip to better handle the existing scheduler, instead of revving up the spin department to compensate for the hardware's embarrassing failure.

      At the low end, AMD is still king. They have been for a good while now, and I've always been happy to flog excellent power-sipping machines based on the Athlon X2/X3/X4. Maybe they should just settle for that market and quit making asses of themselves in the high-end segment. They haven't had a praise-worthy flagship ever since Intel's Conroe.

      --
      -Billco, Fnarg.com
    37. Re:So basically... by Anonymous Coward · · Score: 0

      Except that hyperthreading quite obviously has one fast thread and one slow thread filling the gaps. In AMDs solution both cores in a module are equal, but they share some resources.

      What makes you think Intel has one fast thread and one slow thread filling the gaps? As far as I know, both threads share the core's resources equally and the CPU doesn't favor one over the other.

      It's possible for some resource allocations to be unequal, but not due to favoritism. For example, consider the case where one thread stalls on memory accesses a lot while the other is does lots of register-to-register ALU ops. HT assignment of execution slots is opportunistic as far as I know, so the reg-to-reg thread will take most of the execution slots just because the stalled thread is hardly ever ready to do anything.

    38. Re:So basically... by billcopc · · Score: 1

      Funny, I don't see it that way at all.

      I think AMD enjoyed runaway success because of the P4, which was a very vulnerable platform for countless reasons. Poor IPC, awful thermals, and absurdly high prices. This gave AMD a giant gaping opportunity to dominate with their not-so-shitty AMD64. Then they released the dual-core, another great hit. They enjoyed nearly 4 years without any serious competition from Intel, but the moment Core 2 landed, it trounced AMD64 across the board, and came at a very reasonable price to boot. Sure, Intel learned from their mistakes, but AMD learned nothing. They still didn't have any major pull with OEMs, and their marketing arm did fuckall. The only people who even knew of AMD were gamers and techies. If I tried to sell anyone else a bang-for-the-buck AMD system, they'd ask "wtf is that garbage, I want an Intel"... user ignorance, sure, but AMD did nothing to improve their branding.

      They have been playing catch-up ever since. In a year, when Bulldozer's successor comes out, Intel will also have something new to show. If AMD wants to take the performance crown, I'm fine with that idea, but they need to knock those early reviews out of the park with stellar performance. If they can't accomplish that, then stop trying and just focus on the growing value segment, where they are already known and loved.

      --
      -Billco, Fnarg.com
    39. Re:So basically... by afidel · · Score: 1

      Actually for first generation HT if you cared about performance you turned it off in the BIOS, it wasn't until Nehalem that HT actually added to performance in the majority of situations and that was mostly from a combination of better HT aware schedulers and actually better chip design.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    40. Re:So basically... by Anonymous Coward · · Score: 0

      They used to.

      When they came out hyperthreading would degrade performance on some workloads. The OS saw them as 2 different cores, and you could get hyperthreads bouncing back and forth on the same core destroying the cache and leading to increased pipe-stalls.

      Windows 2000 had issues, the end-user tweak was XP/2003.

    41. Re:So basically... by billcopc · · Score: 1

      You're right, and yet HT processors still offered repeatable performance gains in real-world usage, even under Windows XP. HT-aware scheduling improved the margin somewhat, and narrowed the worst-case losses, but by and large Prescott showed a measurable improvement from day one. HT takes existing code and finds idle "holes" to sneak in another thread's instructions, improving performance with existing software.

      Bulldozer just adds a bunch more physical cores, each one of them running slower than before, and completely ignores the fact that the majority of all desktop software, even if multithreaded, still relies on a heavy "primary" thread to do the bulk of the work. They might offload some tasks to additional threads but usually as an afterthought, cheaply tacked on to an existing codebase that predates multicore processors. Games, web browsers, office suites, media players... This is what Joe Random uses on a daily basis, and thus should be the focus of new consumer products.

      The average user does not spend all day encoding video, or running "make world" for kicks. Their PR crew is spinning this half-baked hardware design as a software failure ? Who are they targeting with this release ? Not the gamers. Not the server crowd. Not the value segment. Not system builders. Who's left ? This doesn't feel like an HPC part, not unless they cram another 8 cores on that die and deliver 4-way and 8-way boards before Q2 2012, but then they would have called it an Opteron.

      --
      -Billco, Fnarg.com
    42. Re:So basically... by Sable+Drakon · · Score: 1

      AMD doesn't have much of a clue who their targeting with some of their processors these days. Even as an Intel user, I'm hoping that AMD doesn't end up committing corporate suicide. We need someone to egg Intel on and offer viable competition. Sadly, Bulldozer is far from the competition that Intel needs.

      --
      The Amarri pray for god, the Caldari pray for profit. the Gallente pray for peace, but the Minmatar pray their ships hol
    43. Re:So basically... by billcopc · · Score: 1

      I agree with about 75% of your post. AMD fucked up, because current operating systems already know how to handle hyper-threading, of which Bulldozer is basically a reimagining. If their CPU reported as 4 cores with 8 threads, chances are most schedulers would treat it like an Intel HT and not overburden it with uncooperative threads.

      AMD fucked up, because they either didn't know this would happen (unlikely), or pretended we wouldn't notice. Now that the reviews are out, their PR team is spinning the blame on software. Why didn't they go back to the designers and address these issues ? Why didn't they work with MS to release a fix before the product's launch ? What they are doing is the tech equivalent of getting caught with your balls in your stepson's mouth and telling your wife it's the neighbour's fault. It is a total WTF denial of their role in this failed product design.

      --
      -Billco, Fnarg.com
    44. Re:So basically... by makomk · · Score: 1

      This gave AMD a giant gaping opportunity to dominate with their not-so-shitty AMD64. Then they released the dual-core, another great hit. They enjoyed nearly 4 years without any serious competition from Intel

      Despite how much better AMD's desktop chips were than Intel's, sometimes they literally couldn't give them away. Intel were threatening to cut off OEM's supplies of laptop chips if they sold AMD processors on the desktop (AMD weren't so competitive for laptops), set up deals where buying less Intel chips would mean paying more money for them (AMD didn't have the capacity to provide all the big OEM's entire supply of processors - they were and are a lot smaller than Intel, and new fabs just took too long to come online). AMD should've been selling every chip they could produce, but Intel used every dirty trick they could think of to make sure this didn't happen.

    45. Re:So basically... by makomk · · Score: 2

      Except that's not quite right either, because classic hyperthreading only gets about 10-20% improvements at most from using two threads rather than one, whereas Bulldozer appears to be closer to 80-90% even for stuff that makes heavy use of the shared resources.

    46. Re:So basically... by Anonymous Coward · · Score: 0

      How can you even dream of posting on /., allegedly "news for nerds", when you don't know the difference between addressing and scheduling?

    47. Re:So basically... by fuzzyfuzzyfungus · · Score: 1

      I am pretty surprised that AMD didn't include notification of this factor in their original release press materials, along with some sort of demo/benchmark/application with hardcoded CPU affinities/etc., that would have gone a fair way to mitigating people's displeasure(yes, it wouldn't have helped people with their workloads now, and yes it would have been dogged by "controversy" over whether the vender demos were rigged or not; but it would have been something).

      Aside from the PR fuckup, though which has no reasonable explanation that I can come up wit, I'm not sure that they really had a choice: Their execution units, while they share FPUs, are substantially more independent that the HT "core" is, so marking it as an HT core would likely have led to fairly shitty performance, and from a die that is paying a pretty sizeable size penalty to have those independent bits that HT doesn't. Also, if they did mark them as HT cores, the odds that they'd be able to wring their own special treatment atypical core designation out of the schedulers for Windows N+1 and Linux 3.XX would not be improved...

      Thinking back, when Intel first released HT, it pretty much blew for people using Windows2000, which was still quite a few of them, since the 2k scheduler just naively assumed that HT cores and real cores were and loaded them accordingly. Results were Not Good. As of XP, and possibly a very late 2k service pack, the situation improved.

      I'm assuming that AMD is scrambling to have this included in mainline Linux as soon as possible, and are likely petitioning Redmond as well; but unless their wheedling powers are greater now than they were during the x86-64 introduction(where MS dragged their feet for ages until Intel decided that it was a cool idea after all), I'm not sure that the could get scheduler support for their new core type included any faster.

      The PR handling seems little short of insane, and none of this is going to help them move units; but the option of just setting the HT bit presumably was nixed for some reason.

    48. Re:So basically... by PhrstBrn · · Score: 1

      Treating them like Hyper-Threading is still the wrong solution. AMD should have done everything it could have to get patches included in Windows that include a performance tweek for it's processor, and gotten patches included in the Linux kernel. It looks like they tried to do both (Windows 8 has support, there are patches out there for the Linux kernel, but not included as far as I'm aware), but failed to get it done before their product launched. Either they started the process at the last minute, or they failed to work with the teams to reach a solution everybody was happy with.

      Treating it like Hyper-Threading probably would have helped and been better than nothing, but I doubt it's the correct solution

    49. Re:So basically... by malkavian · · Score: 1

      Did processor design a LONG time ago (forgotten most of it).
      You can get benefits by having a compiler that makes most use of it, and schedulers that know how to eke out the last ergs.
      If you put a radically different design out there that a base OS wasn't designed to handle, then it's no surprise that things don't quite work out..
      It's easy enough to tweak the software to get the gains (as long as you have enough leverage with the OS vendor, or can make the patch yourself); why allow things to stagnate, and force optimisation of a legacy system when you can start doing things for the future?

    50. Re:So basically... by Kjella · · Score: 1

      What makes you think Intel has one fast thread and one slow thread filling the gaps? As far as I know, both threads share the core's resources equally and the CPU doesn't favor one over the other.

      It's possible that I've misunderstood something, but it was my impression that one thread was dominant so that if they both want the same resource at the same time it'd always come first. That way a heavy single threaded application would get nearly the same performance as before unless it was blocked by calculations already in progress, while simultaneously letting it do light work on the side. Early hyperthreading performance was the same for most workloads, indicating to me that one thread run practically all the time and the other didn't get any work done at all. That is at least how I interpreted the numbers.

      --
      Live today, because you never know what tomorrow brings
    51. Re:So basically... by Anonymous Coward · · Score: 0

      [citation needed]

      I know that's been the populist take on it for some time, and for a time I fell victim to it as well. But what evidence do you have? You're basically just parroting AMD's PR department.

      The sum of most of what I've read on the subject seems to come down to a few people in Intel saying things they shouldn't have - internally. No evidence of wrongdoing or of action on what they said. But circumstantial evidence is a bitch, and protracted legal battles expensive, and with AMD looking to be close to expiring anyway... Intel just said "fuck it."

    52. Re:So basically... by Rockoon · · Score: 1

      The best solution is likely for the OS to continuously explore scheduling strategies in real-time, making it adaptive.

      Every N scheduler quanta, collect statistics about the last N quanta (perhaps Instructions Retired) assigning those statistics to the strategy employed during that period, and then decide on a new scheduling strategy (95% of the time go with the "best" strategy in the list, 5% of the time "explore" by choosing a strategy at random)

      A method such as this could prove superior than current methods even on legacy gear where the only shared resource is cache, especially when the OS is juggling more threads than there are cores. The trick would be to have a large number of strategies to try such that one of them will actually be near optimal.

      --
      "His name was James Damore."
    53. Re:So basically... by inglorion_on_the_net · · Score: 1

      So basically they suck. I shouldn't need to tweak my os thread scheduler just so a cpu can suck less. AMD needs to fix their shit instead of lame excuses.

      Or, alternatively, the default settings for Windows 7 thread scheduling just aren't optimal for Bulldozer, and therefore not getting the full performance that Bulldozer is capable of.

      Just because Windows default settings don't get optimal performance out of it doesn't mean the hardware sucks. It doesn't even mean Windows sucks; after all, you _can_ change the settings so that it performs better.

      Question to you: what would you prefer:

      A. CPU manufacturers optimize for performance potential

      B. CPU manufacturers optimize for what happens to give the best results with the current default settings for some current version of some OS vendor's OS

      Looks like you prefer B, but I think A is the better choice.

      --
      Please correct me if I got my facts wrong.
    54. Re:So basically... by washu_k · · Score: 1

      I wouldn't say it was that good, maybe more like 50-60%, but you are correct that Bulldozer is better than hyper-threading. The point is that BD is still not as good as real cores and thus scheduling it like hyper-threading works better than scheduling it like real cores.

    55. Re:So basically... by washu_k · · Score: 2

      No, that is not correct. Hyper-threading gives each thread the same amount of resources, assuming they can use them equally. The only difference between hyper-threading and a BD module is that the BD module has a dedicated integer execution unit and L1 D cache for each thread. Everything else is shared just like in Intel cores. It is simply a better hyper-threading, not real cores.

    56. Re:So basically... by ak3ldama · · Score: 1

      [do a google lookup and fuckoff]
      Seriously. Try it. Here is the goole search and then here is a nytimes article on it. This idiotic [citation needed] shit is so utterly rediculous. Oh Obama is an American born on US soil? Fuck that [CITATION NEEDED!!!]. Global Warming? Fuck that [CITATION NEEDED!!!]. Even wikipedia has an article on it. What do you expect people to do for you? Find internal documents from the FTC that categorically prove what happened? All we both have to go on is that Intel spent a shit load of money trying to defend itself and has been ruled against and has settled a deal to AMD and paid out even more money.

      --
      "but money is the God of Algiers & Mahomet their prophet." - Rich. O'Bryen June 8th 1786
    57. Re:So basically... by Daengbo · · Score: 1

      Hopefully this makes it into the kernel really soon now.

    58. Re:So basically... by Daengbo · · Score: 1

      Wow. That's a terrible analogy. Intel eventually gave up on the PIV architecture and went back to PIII, which is what birthed the Core line.

    59. Re:So basically... by beelsebob · · Score: 1

      Funny, what I remember about the "progress" of the P4 was that intel dropped that design as being fucking stupid, and went back to the Pentium 3 -> Pentium M -> CoreDuo -> Core2Duo line of development ;).

    60. Re:So basically... by beelsebob · · Score: 2

      This is actually exactly what you wouldn't want in a design –when you're designing a threading model, whether at the application level, the OS level or the CPU level, you absolutely do not want thread starvation. Designing it in is just dumb, hence why intel didn't.

    61. Re:So basically... by jthill · · Score: 1

      Compilers can take advantage of model-specific instruction scheduling idiosyncracies and get major performance boosts. Why shouldn't OS's take advantage of thread-scheduling idiosyncracies for similar boosts? If AMD's chip can deliver equivalent performance cheaper, but only if your OS knows how to use it, then if your OS hasn't been taught how to do that yet, what needs fixing is your OS.

      --
      As always, all IMO. Insert "I think" everywhere grammatically possible.
    62. Re:So basically... by Anonymous Coward · · Score: 0

      It's possible that I've misunderstood something, but it was my impression that one thread was dominant so that if they both want the same resource at the same time it'd always come first.

      But how should it choose which thread is dominant? Generally speaking, neither the OS nor the HW has any idea which thread is most important.

      I believe the only policy for HT resource allocation is more or less a fairness policy -- because if anything they want to ensure that no starvation takes place. It's also known that some resources are just allocated in a fixed 50/50 split (e.g. reorder buffers in Nehalem) to simplify the implementation.

      That way a heavy single threaded application would get nearly the same performance as before unless it was blocked by calculations already in progress, while simultaneously letting it do light work on the side. Early hyperthreading performance was the same for most workloads, indicating to me that one thread run practically all the time and the other didn't get any work done at all. That is at least how I interpreted the numbers.

      I'd interpret it as much of the list being single-threaded benchmarks, especially the games (at that time it was uncommon for games to do computation on >1 thread). Anand didn't have a lot of multi-threaded tests in his toolkit back then!

    63. Re:So basically... by Sloppy · · Score: 1

      But does the end user have to do esoteric tweaks themselves for an Intel processor with hyperthreading?

      To the same degree as Bulldozer. If you think updating your kernel is burdensome to get the most out of Bulldozer, then you probably had the same complaint with the Pentium 4's HT. OTOH if you thought updating your kernel to make a Pentium 4 multtitask better was no big deal, then you're going to have the same attitude about Bulldozer -- that it's no big deal either.

      Substitute "Pentium 4" with whatever multiple-instruction-pointers-in-hardware that you first used. If you were playing with SMP motherboards and cheap Celerons (don't remember the details but there was a very cost-effective combo back in the day that turned a lot of people on), similarly you had to update from Windows 95 to NT or Linux 1.x to 2.0.

      Really, it all comes down to this: when the hardware guys come out with something new, the pre-existing software is usually not already built to support it to maximum advantage. That's not always the case (e.g. by the time Ivy Bridge comes out in 2012 your kernel from 2011 might already use it very well -- but OTOH your kernel from 2008 probably won't) but usually is.

      I agree with the other poster who points out the consistent usage of "esoteric tweaks." Whether it's a "troll meme" or astroturfing, I don't know, but it sure looks stupid and discredits every poster who jumps on that bandwagon.

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  4. Weird by 0123456 · · Score: 1

    Perhaps I'm remembering incorrectly, but I thought part of the Bulldozer hype was that it had two 'real' cores and not hyperthreading, with only a few resources shared? Yet now it turns out that you have to treat it like a hyperthreading CPU or performance sucks.

    I still don't understand why AMD didn't just set the hyperthreading bit in the CPU flags, so Windows would presumably just treat it like a hyperthreading CPU in the first place.

    1. Re:Weird by Anonymous Coward · · Score: 0

      Lying to the OS for short term gain means long term pain.

    2. Re:Weird by Sloppy · · Score: 2

      I thought part of the Bulldozer hype was that it had two 'real' cores and not hyperthreading,

      No, the hype is that it blurs the distinction between cores and hyperthreading. It's both and neither.

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
    3. Re:Weird by laffer1 · · Score: 2

      It's not like hyper threading. For integer operations, the AMD chips are much better. What AMD doesn't have is two floating point units so that's what gets bogged down. There are two instruction decoders and two units to handle integer math, but one floating point unit per component.

      AMD's approach is faster for some workloads. The problem is that they didn't design it around how most people currently write software.

      I would have preferred AMD to implement hyper threading as it would have greatly simplified things for OS developers. It's getting to a point where kernels have to know about CPU families in order to get the performance they need. They also have to know the workload.

      For instance, if i'm trying to save power in a laptop, it's best with the new AMD chips to give all the instructions to the first two logical cpus which are the same cores. Then the others can go into an enhanced sleep state. However, this is slower than distributing to different physical cores. I'm even having trouble with terminology with these chips.

      With intel chips, it's best to keep the same processes on nearby cores to take advantage of cache (for those that are really 2 cpus on the same package) but to avoid scheduling them on two threads on the same core. Again the power issue comes into play with intel chips as other cores could go into C1E state or similar.

      AMD did add special instructions to the bulldozer chips that speed up floating point, but compilers and applications have to take advantage of them. Microsoft's Visual Studio does not yet.

    4. Re:Weird by Anonymous Coward · · Score: 0

      Yes. I as a user should not have to make esoteric workarounds for the lousy performance of your product. Especially when even with the tweaks it is only marginally less crappy but still sucks more than the competition or even your own competing product line that is cheaper. The Phenom II x6s can blow away the fx-8150 at half the price point.

    5. Re:Weird by 0123456 · · Score: 2

      It's not like hyper threading. For integer operations, the AMD chips are much better. What AMD doesn't have is two floating point units so that's what gets bogged down. There are two instruction decoders and two units to handle integer math, but one floating point unit per component.

      Ah, so this benchmark is floating point and that's why it's faster across multiple cores?

      I can't really see AMD convincing Microsoft to invest a lot of effort into dynamically tracking which threads use floating point and which don't and reassigning them appropriately. Maybe a flag on the thread to say whether it's using floating point or not at creation time would be viable, but then app developers won't bother to set it.

    6. Re:Weird by fuzzyfuzzyfungus · · Score: 1

      To me, Bulldozer's shared-FPU design looks rather like they wanted some of the specialized-workload advantage of the UltraSPARC T-series CPUs; but with somewhat less extreme trade-offs(The T1 had a single FPU shared between 8 physical cores, which proved to be a little too extreme and was beefed up in the T2). There are a fair number of server tasks that are FPU light; but have lots of threads, often do well with a lot of RAM, and are fairly cost sensitive.

      Not at all a good recipe for a workstation or scientific computing device(which shows in that some of the present Phenoms stack up uncomfortably well with the newer architecture); but there are a lot of server loads that can use as many cheap threads as you can throw at them; but don't really hit the FPU all that hard...

    7. Re:Weird by 0123456 · · Score: 1

      Lying to the OS for short term gain means long term pain.

      Shipping hardware whose performance sucks on real workloads and expecting the OS developers to fix your problem causes short-term pain that leads to long-term pain as your sales drop through the floor.

    8. Re:Weird by DamonHD · · Score: 2

      A T1 is still working well for me: at most about 1 thread on my entire Web server system is doing any FP at all, and in places I switched to some light-weight integer fixed-point calcs instead. That now serves me well with the came code running on a soft-float (ie no FP h/w) ARMv5.

      So, for applications where integer performance and threading is far more important than FP, maybe AMD (and Sun) made the right decision...

      Rgds

      Damon

      --
      http://m.earth.org.uk/
    9. Re:Weird by washu_k · · Score: 1

      It's not like hyper threading. For integer operations, the AMD chips are much better. What AMD doesn't have is two floating point units so that's what gets bogged down. There are two instruction decoders and two units to handle integer math, but one floating point unit per component.

      It's a lot closer to hyper-threading than you think. The BD chips do *NOT* have two instruction decoders per module, just one. The only duplicated parts are the integer execution units and the L1 Data caches. The Instruction fetch+decode, L1 Instruction Cache, Branch prediction, FPU, L2 Cache and Bus interface are all shared.

      It's pretty obvious how limited each BD "core" really is given these benchmarks. AMD should have presented the CPU as having hyper-threading to the OS.

    10. Re:Weird by Chris+Burke · · Score: 1

      It's not like hyper threading. For integer operations, the AMD chips are much better. What AMD doesn't have is two floating point units so that's what gets bogged down. There are two instruction decoders and two units to handle integer math, but one floating point unit per component.

      The decoders are a shared resource in the Bulldozer core. That can be a significant bottleneck that affects integer code. Also, those integer sub-cores are still sharing a single interface to the L2 and higher up the memory hierarchy. So it's not all roses for integer apps.

      Speaking of memory hierarchy, the FX parts are, like FX parts of the past, just server chips slapped into a consumer package. So the cores being studied here all have pretty substantial L3s. One of the claimed benefits of putting related threads on the same core is that they can share via the L2. Which is true, but partially mitigated by sharing on the L3.

      I would expect mainstream consumer parts based on the BD core to lack an L3, and then it's more likely that scheduling integer threads from the same process on the same core will provide a bigger benefit. The one test in the article that benefited from the 0xf affinity mask should show an even bigger increase, and other tests might change which affinity is preferred.

      --

      The enemies of Democracy are
    11. Re:Weird by laffer1 · · Score: 1

      The earlier article i read must have been way off then.

      Here's a set of graphics displaying the actual architecture.

      http://www.anandtech.com/Gallery/Album/754#7

    12. Re:Weird by Anonymous Coward · · Score: 0

      I can't think of any server workloads that are FPU-intensive. Certainly serving mail, files (or web), or DNS doesn't use any FP math. Databases could, but it's probably more common for them to use decimal arithmetic which doesn't hit the FPU. I'd guess it's mostly specialized app servers that would need real FPU performance.

      dom

    13. Re:Weird by Anonymous Coward · · Score: 0

      Kind of true there is one 256bit fp unit. But as far as i understand only some new instructions which are not used in the bbinaries that we use today uses the full 256bit fp unit. This fpu unit will work as two 128bit units with all other instructions. So the integer / fp should not be on the table yet. ( Well the intel fp unit are faster than one 128bit unit) So amd need's these new instructions to really crunch fp math. I wonder if the slow down comes from the shared frontend. one module 2 cores share the same frontend and instruction fetcher

    14. Re:Weird by yakovlev · · Score: 1

      No, you use the standard hyperthreading algorithm for when you have more cores than threads.

      Schedule both threads on a single core. If either one uses its entire time slice, then move it to be alone in a core.

      If two threads on different cores are each using "significantly" less than their entire timeslice, then try combining them on a single core.

      You'll notice that none of these actually involve knowing anything about the processes themselves. While that may be useful for more fine-grained tuning, the simple algorithm above will cover 80% of cases in a "good enough" way.

      At a secondary level, memory affinity, etc. can also drive what cores to try to put together to help with L2+ cache sharing, but this is to optimize the more difficult cases, or cases where there are more threads than cores.

    15. Re:Weird by willy_me · · Score: 1

      AMD's approach is faster for some workloads. The problem is that they didn't design it around how most people currently write software.

      No argument here, they designed it for how people are going to write software in the future.

      AMD envisions a future where fp intensive operations are performed on a GPU. Want to calculate that FFT? - you won't be using the fp unit in the CPU. And honestly, they're right. The libraries and tools required to do this are already available. The biggest problem now is that there are too many different solutions - most of them not compatible with each other. Once this is resolved and there is enough installed hardware in the market, developers who really need good fp performance will make the switch.

      From this perspective, the Bulldozer architecture makes a lot of sense - at least it will 5 years from now.

    16. Re:Weird by petermgreen · · Score: 1

      Is it really lying? is there a formal definition of the "hyperthreading bit"

      In both cases you have two logical cores that share some computing resources, the only difference is in what computing resources are shared. From the point of view of an OS scheduler knowing exactly which computing resources are shared is FAR less important than knowing that resources are shared

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    17. Re:Weird by cynyr · · Score: 1

      Looks like it runs fast enough for desktop, and better than intel on FPU based multi-threading loads (x264 and similar) before the tweaks. After them it should better, how much I'm not sure.

      --
      All of the above was encrypted with a Quad ROT-13 method. Unauthorized decryption is in violation of the DMCA.
    18. Re:Weird by hitmark · · Score: 1

      My understanding is that AMD wants to see FP move to the GPU.

      But right now that requires use of special libs and programming towards that goal.

      I guess their hope is that if it catches on then compilers and/or system libs will make this transparent.

      --
      comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
  5. "10% to 20%" boost is just overclocking processor by Skarecrow77 · · Score: 1

    The article basically says "if your schedule threads to use less modules, dynamic turbo will clock those modules up, giving you a performance boost.

    so... anybody who is already clocking their entire cpu at top stable clock speed isn't going to get a boost out of thread scheduler modifications.

  6. Re:"10% to 20%" boost is just overclocking process by Skarecrow77 · · Score: 1

    I take it back. apparently that's what page 1 says. There is a page 2 and it says something else entirely.

  7. But does it actually make a difference? by robot256 · · Score: 2

    Sure, the scheduling change improves performance by 10-20% for certain tasks, but that still makes it 30-50% slower than an i7, and with more power consumption.

    I can't fault AMD for not having full third-party support for their custom features, since Intel had a head-start with hyperthreading, but if it will still be an inferior product even after support is added then I'm not going to buy it.

    1. Re:But does it actually make a difference? by h4rr4r · · Score: 1

      30% slower at what percentage of the cost?
      If it costs 50% as much as an i7 that might then be fine.

    2. Re:But does it actually make a difference? by AdamJS · · Score: 1

      They generally cost between 8% less and 20% MORE than their closest performance equivalents (hard to use that word since the gap is still pretty noticeable). That's sort of part of the problem.

    3. Re:But does it actually make a difference? by HarrySquatter · · Score: 2

      An i2600k is only 15% more expensive has a 25% lower tdp and blows away the fx-8150 in most of the benchmarks. Even with this tweak it'll still barely compete and the 2600k has half as many real cores and a lower clock speed.

    4. Re:But does it actually make a difference? by HarrySquatter · · Score: 1

      That and the fact that they are power hogs compared to even the higher end sandy bridge and phenom ii processors

    5. Re:But does it actually make a difference? by Anonymous Coward · · Score: 0

      You forgot to take mainboard price into account

    6. Re:But does it actually make a difference? by rrohbeck · · Score: 1

      And it supports neither ECC RAM nor VT-d, that's why I'm planning my new workstation around a Bulldozer.

    7. Re:But does it actually make a difference? by Anonymous Coward · · Score: 0

      ECC RAM is a good point, but do you actually need VT-d? It's unnecessary for most desktop/workstation virtualization scenarios. Also, as I understand it, while support for AMD's equivalent IOMMU technology is always present in AMD Opterons and maybe even their desktop CPUs, it needs chipset support too and not all AMD chipsets support it.

  8. AMD Doesn't learn from intel by Anonymous Coward · · Score: 1

    I would have been content if they had shrunk the X6 core down to 32nm, slap 2 of them on a chip and sell it as a 12 core. They could have released it a year ago.

    Intel did just that with their first quad core, and the consumer wasn't concerned about philosophical discussions on its cores. Heck I'm typing this message on a kentsfield chip right now and even after all these years its a great processor.

  9. Very Sad... by poly_pusher · · Score: 1

    Why does this sound like Barcelona? Granted, Bulldozer doesn't seem to have the same breadth of architectural flaws but still. God I miss the days when AMD came out with the X2 series... There is just no way AMD can compete with Sandy Bridge. With Ivy Bridge coming up, things are not looking good for AMD. After Barcelona they need to catch up a bit however, the performance difference seems to be increasing compared with Intel's offerings.

  10. It's a Windows limitation by Animats · · Score: 3, Informative

    This is really more of an OS-level problem. CPU scheduling on multiprocessors needs some awareness of the costs of an interprocessor context switch. In general, it's faster to restart a thread on the same processor it previously ran on, because the caches will have the data that thread needs. If the thread has lost control for a while, though, it doesn't matter. This is a standard topic in operating system courses. An informal discussion of how Windows 7 does it is useful.

    Windows 7 generally prefers to run a thread on the same CPU it previously ran on. But if you have a lot of threads that are frequently blocking, you may get excessive inter-CPU switching.

    On top of this, the Bulldozer CPU adjusts the CPU clock rate to control power consumption and heat dissipation. If some cores can be stopped, the others can go slightly faster. This improves performance for sequential programs, but complicates scheduling.

    Manually setting processor affinity is a workaround, not a fix.

  11. No problem... by reztek · · Score: 1

    http://hardware.slashdot.org/story/11/09/13/1336210/amd-breaks-overclocking-record-with-bulldozer AMD already showed how to speed things up on their Bulldozer line

    1. Re:No problem... by HarrySquatter · · Score: 1

      Oh goody! Now the tdp can be even worse than it already is!

    2. Re:No problem... by h4rr4r · · Score: 1

      Why do you care about a few measly watts?

      Is another 50watts really going to break your budget?
      If that is the case you probably should not be buying a new computer.

    3. Re:No problem... by 0123456 · · Score: 1

      Why do you care about a few measly watts?

      Oddly, the AMD fanboys were making the opposite argument back in the days when you could cook your breakfast on your Pentium-4 while checking your email.

    4. Re:No problem... by h4rr4r · · Score: 1

      I own a Phenom 2 X4 and a Core 2 Quad. I buy what meets my needs at the price point I want when I want to buy it.

      I am not a fanboy of either, I just want to see AMD survive so I don't have to pay far out the ass for CPUs. I owned one of those P4s at the time, I bought an Athlon that put it too shame not much later.

    5. Re:No problem... by Anonymous Coward · · Score: 0

      Might as well spend a few measly bucks and get an Intel based computer.

    6. Re:No problem... by Lithdren · · Score: 1

      Oddly, the AMD fanboys were making the opposite argument back in the days when you could cook your breakfast on your Pentium-4 while checking your email.

      This is why I was always sad nobody invented a teflon topped computer case.

      I'd like to read my morning email while cooking bacon, eggs, and pancakes.

      Bonus points if you can get a waffle iron in there somehow.
      ,

    7. Re:No problem... by makomk · · Score: 1

      Pentium-4 managed to be hot, slow and expensive all at once... though by comparison to modern chips I don't think it's actually that power-hungry, scarily enough. Intel and AMD are heading well above the 100W level again.

    8. Re:No problem... by Anonymous Coward · · Score: 0

      Why do you care about a few measly watts?

      Is another 50watts really going to break your budget?

      BD overclocking adds more like 200 watts (loaded not idle):

      http://www.hardocp.com/article/2011/10/11/amd_bulldozer_fx8150_desktop_performance_review/9

  12. Windows? by turgid · · Score: 1

    Windows is not exactly known for its multi-processor (multi-core) scalability.

    Repeat the test with a real OS (Linux, Solaris...) and I'll be interested, especially Solaris x86 since it is known to be the best at scaling on parallel hardware.

  13. It was already beating all intel in highly threade by unity100 · · Score: 5, Interesting
    applications, like photosop cs5 or truecrypt, including some more :

    http://www.overclock.net/amd-cpus/1141562-practical-bulldozer-apps.html

    also, if you set your cpuid to genuineintel in some of the benchmark programs, you will get suprising results :

    try changing cpuid=genuineintel for +47% INCREASE IN SCORES.

    changing cpuid to GenuineIntel nets 47.4% increase in performance:
    [url]http://www.osnews.com/story/22683/Intel_Forced_to_Remove_quot_Cripple_AMD_quot_Function_from_Compiler_[/url]

    PCMark/Futuremark rigged bentmark to favor intel:
    [url]http://www.amdzone.com/phpbb3/viewtopic.php?f=52&t=135382#p139712[/url] [url]http://arstechnica.com/hardware/reviews/2008/07/atom-nano-review.ars/6[/url]

    intel cheating at 3DMark vantage via driver: [url]http://techreport.com/articles.x/17732/2[/url]

    relying on bentmarks to "measure performance" is a fool's errand. dont go there.

  14. You've helped me find a SMALL "BUG" by Anonymous Coward · · Score: 0

    Type start /? & see this part - the "help/manpage" for the start command!

    (Mind you - I am on Windows 7 64 bit here):

    SEPARATE Start 16-bit Windows program in separate memory space.
    SHARED Start 16-bit Windows program in shared memory space.

    * The "bug" being there ARE NO 16-bit WINDOWS SUBSYSTEMS IN 64-bit Windows..., only 32-bit subsystems...

    APK

    P.S.=> No "Huge Bug", but a misleading statement in the start commands' help outputs... apk

    1. Re:You've helped me find a SMALL "BUG" by Anonymous Coward · · Score: 0

      Shut the fuck up, jackass.

  15. stave me by epine · · Score: 1

    I would have preferred AMD to implement hyper threading as it would have greatly simplified things for OS developers. It's getting to a point where kernels have to know about CPU families in order to get the performance they need. They also have to know the workload.

    This an architecture designed for a ten year run, much like the original P6, which underwhelmed everyone with (at most) half a brain.

    Just how long do you think the OS can remain task agnostic as we head down the road to eight and sixteen core processors? Why plan for the future when we can languish on easy-street for another year or two? When the PC came out, some people complained they "would have preferred" a superior and more reliable electronic typewriter.

    I'm quite certain the correct design approach is to resource a CPU regarding TDP as your performance wall. If eight floating point units require more TDP than your chip provides, what point is there in providing eight such units? And even if the math in the first spin from the new architecture could have gone the other way on some of these matters, in no time at all you're up hard against it, if you glance a few weeks further down the roadmap.

    They also have to know the workload.

    It's a bizarre conceit in any other walk of life that you can get away with not knowing the workload on the path to optimal resource assignment. Half of the human brain is devoted to power management. The glucose demand of the human brain is one of the big reasons why we were a late addition to mother nature's species road map. The brain doesn't operate from a baseline glucose guzzle equally able to handle any task that might come up. Much of what we perceive as quick reaction is only possible because the brain decided to fire up the necessary circuit 400ms beforehand.

    1. Re:stave me by laffer1 · · Score: 1

      And how would the OS know the workload ahead of time? It's not like there are hints in the binary that it's going to be doing floating point work or that it's going to be CPU bound.

      Remember that the more complex we make scheduling, the slower it is. Schedulers have to be fast. There's only so much the OS can do to help out. As a programmer, we're taught that the hardware is a black box. We're supposed to assume it works correctly most of the time. There's a big difference between seeing a hyper threading flag (which describes behavior) and going oh this is an AMD Bulldozer cpu model blah and needs to do this and this to be fast. These are supposed to be general purpose machines. I don't think people would like windows updates or new linux kernels coming out every quarter just to hack around the latest chips and using the model to know how to perform. It's just not reasonable to ask for that.

    2. Re:stave me by Anonymous Coward · · Score: 0

      This an architecture designed for a ten year run, much like the original P6, which underwhelmed everyone with (at most) half a brain.

      The only reason P6 underwhelmed was that Intel miscalculated when the transition from 16-bit to 32-bit Windows would take place. Everyone knew 32-bit was the future, Intel just jumped the gun a bit with a CPU that was awesome at 32-bit at a time when most installed software and applications were still 16-bit.

      Bulldozer doesn't really have that going for it. It's designed to be fast if you have tons of independent threads and don't care so much about individual thread performance. AMD designed it that way not because that's the future of software (*), but because it's easier and cheaper to do than what Intel's doing, which is to get the same net throughput with fewer cores and awesome per-core performance.

      * - In one sense it is the future, because the whole reason we have multicore CPUs at all is that it's flatly impossible to spend the same die area on a single thread and get equivalent throughput to N threads, so software has to adapt since more threads are what the hardware guys are going to be supplying. But threaded software is hard to write, and Amdahl's law (look it up if you don't know it) makes it hard to scale software to arbitrary numbers of cores. So, CPUs which extract as much serial performance as possible and still have the same throughput are always going to be preferable. It's a more general solution. And that's exactly what is seen in Bulldozer vs. Sandy Bridge benchmarks; SB is a much better all around CPU.

    3. Re:stave me by drinkypoo · · Score: 1

      The human brain can make all the glucose it needs itself. Otherwise the Atkins diet would just kill you immediately.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    4. Re:stave me by Daengbo · · Score: 1

      I don't think people would like windows updates or new linux kernels coming out every quarter just to hack around the latest chips and using the model to know how to perform. It's just not reasonable to ask for that.

      I was actually under the impression that's exactly how it worked right now.

    5. Re:stave me by Rockoon · · Score: 1

      And how would the OS know the workload ahead of time? It's not like there are hints in the binary that it's going to be doing floating point work or that it's going to be CPU bound.

      The OS doesn't need to know the workload ahead of time.. it just need to figure it out quickly (because the short run doesnt matter.)

      We already have "ad hoc" Down-Clocking (Speed Step and Cool'n'Quiet) and "ad hoc" Over-Clocking (Turbo Boost and Turbo Core), so why not "as hoc" Core Assignment as well?

      --
      "His name was James Damore."
    6. Re:stave me by laffer1 · · Score: 1

      Some of those technologies require OS intervention as well. You're not really offloading it onto the processor. Also, those technologies are general features that are advertised to the OS. They're not specific to a chip family.

    7. Re:stave me by Rockoon · · Score: 1

      The fact that those technologies are proprietary, not universal, require OS support, but do not require the OS to "know the workload ahead of time" (a direct quote of your claim of necessity) is *the point* which seems to have completely eluded you.

      The OS simply does not need to know the workload ahead of time. Period. Your claims to the contrary are wrong, and no wiggling room is going to be afforded to you by me on this matter.

      Your entire premise was and still is a fabrication, completely made up nonsense that doesnt pass even the slightest bit of critical thinking by anyone with any actual knowledge on the matter. Current technologies discredit your theory about the demands of future ones. Period.

      --
      "His name was James Damore."
  16. Widows 7 hotfix by Anonymous Coward · · Score: 0

    With Windows it's always wait until the next version of Windows to get new features. MS could easily hotfix Windows 7 with a Bulldozer aware scheduler. The won't, because they want something to drive sales of there "new and improved" OS. This is just one of the various reasons why I use Linux. In about a month Linux (kernel 3.2) will roll out a Bulldozer aware scheduler and I'm set.

    Interestingly enough, if you look at the current Bulldozer benchmarks on Linux, it's performing quite nicely (even without the Bulldozer tuned scheduler).

    1. Re:Widows 7 hotfix by Anonymous Coward · · Score: 0

      We can wait for the B3 stepping and see what happens too. I remember when the AMD64 was found to be slower in 64-bit mode under windows, mainly because Microsoft hadn't quite gotten its act together with its half-hearted release of XP-64 Professional.

  17. Re:It was already beating all intel in highly thre by yuhong · · Score: 2

    It is time for some reverse engineering of the benchmark programs I think to see what exactly is happening.

  18. No need, everyone knows... by Anonymous Coward · · Score: 2, Informative

    Here's Agner Fog's page about this issue.

    The Intel compiler (for many years and many versions) has generated multiple code paths for different instruction sets. Using the lame excuse that they don't trust other vendors to implement the instruction set correctly, the generated executables detect the "GenuineIntel" CPU vendor string and deliberately cripple your program's performance by not running the fastest codepaths unless your CPU was made by Intel. So e.g. if you have an SSE4-capable AMD CPU, it will run the SSE2 codepath instead of the SSE4 codepath that comparable Intel chips will run.

    Over the years, MANY libraries (including several from Intel) have been compiled and shipped with this compiler, with the result that the applications compiled with those libraries including many benchmarks, also suffer from the same performance sabotage.

    1. Re:No need, everyone knows... by yuhong · · Score: 1

      As a note, I remember seeing one of Intel's libraries used in MSHTML.DLL in MS's own IE9 when I was disassembling it with IDA.

  19. An application of "ReVeRsE-PsyChoLoGy" by Anonymous Coward · · Score: 0

    ".ssakcaj ,pu kcuf eht tuhS" - by Anonymous Coward ANOTHER "ne'er-do-well" /. OFF-TOPIC TROLL on Friday October 28, @02:54PM (#37872146)

    "???"

    Uhm... Could we get a translation of that off-topic "troll-speak/trolllanguage" of yours, please?

    * And, you're an off-topic troll - no questions asked...SEE MY SUBJECT LINE ABOVE, because the use of profanity of your part? Doesn't make for an intelligent reply!

    APK

    P.S.=> Yes, it must have just have been another off-topic done nothing of significance with his life troll spewing his off-topic b.s. again & not contributing to the ongoing conversations. Oh well - No biggie!

    ("ReVeRsE-PsYcHoLoGy", for trolls - Courtesy of this code by "yours truly" in less than 1 second flat):

    ---

    #TrollTalkComReversePsychologyKiller.py (Ver #2 by APK)

    def reverse(s):
        try:
            trollstring = ""
            for apksays in s:
            trollstring = apksays + trollstring
        except:
            print("error/abend in reverse function")
        return trollstring

    s = ""
    print reverse(s)

    try:
      s = "Insert whatever 'trollspeak/trolllanguage' gibberish occurs here..."
      s = reverse(s)
      print(s)
    except Exception as e:
      print(e)

    ---

    ... apk

  20. Re:It was already beating all intel in highly thre by blair1q · · Score: 1

    >relying on bentmarks to "measure performance" is a fool's errand. dont go there.

    And yet, that's what you're doing.

    The correct phrase is: Relying on benchmarks that are not relevant to your application is a fool's errand.

  21. Re:It was already beating all intel in highly thre by makomk · · Score: 1

    One fun side note: notice how that link says "it will fail to recognize future Intel processors with a family number different from 6". Intel have conspicuously kept the family number reported by CPUID at 6 on their new processors in order not to trigger a fallback to the non-Intel pathway that AMD processors get to use, presumably because they know how much that'll harm them in benchmarks and how bad the reviews will look.

  22. Re:It was already beating all intel in highly thre by Anonymous Coward · · Score: 0

    AMD lost the game a while back. Intel's getting ready to introduce new chipset and architecture which will blow whatever AMD currently has out of the water.

    For a while, AMD was beating the crud out of Intel, but well, yeah.

  23. Re:It was already beating all intel in highly thre by zbobet2012 · · Score: 1

    The correct phrase is: Relying on benchmarks that are not relevant to your application is a fool's errand.

    Yes, yes, yes.

    The bulldozer architecture is heavily optimized for highly threaded applications with a heavy reliance on integer operations. This is well represented by today's server workloads, not todays desktop applications. But more importantly for AMD's future this also represents the trending path of tomorrows applications. A great example of this is Battlefield 3, where the 8150 outperforms the i72600k. Unfortunately today this also means thatwhether Bulldozer or Sandybridge is faster today depends on the application. As from the above test we can almost assuredly guess that BF3 does more integer work, while Civ 5 does more floating point work.

    However, less obvious than the multithreading issue is the push away form using the CPU for floating point operations. This is one both Intel and AMD having been slowly gambling on for quite some time, putting floating point operations on the GPU. AMD has just taken a more "committed" approach to this. Its also something that may pay off big time.

    As an aside, as a server administrator today I would buy Bulldozer over Sandybridge based processors in a heart beat. Most of the "scale out" boxes such as web caches, database servers, etc., are highly multithread integer driven workloads. In this case bulldozer is going to destroy sandybridge, plain and simple. Also to those citing supercomputers, those tend to be floating point driven as they are generally for simulation.

  24. Re:It was already beating all intel in highly thre by blair1q · · Score: 1

    Those 8150 vs. 2600K numbers are sans discrete graphics. It shows the 8150 integrating a stronger graphics core. It's still a pretty crummy graphics core. It's just better than the basic-equipment one that Intel put in the 2600k. The 2600K is much older than the 8150, though, and hardly Intel's top-end part. It's just the one that AMD chose to compare against. They do that. Bring out a half-assed chip, then pick an Intel part they can beat and beat it. While Intel's other parts stand on the other side of the cafeteria wondering why the fat kid is picking on the nerdy kid instead of someone their own size.

    So, if you're in the small niche where you want to play Battlefield 3 on a computer with no discrete graphics, the 8150 is probably your choice. Or you could wait a couple of weeks for Intel's next part. Or upgrade to new-release discrete graphics, which will kick that integrated graphics solution's ass using either CPU.

    So what I'm saying is, AMD's market is lamers, not gamers. And they seem to be making a little money at it. For now. Bulldozer has yet to dominate a quarterly report, so its issues, which comprise both performance and manufacturability, have yet to reveal themselves as a significant driver of the bottom line. Q3 was good to them, Q4 not likely to be. 12Q1 could be their doom if they don't find a miracle.

  25. Big mouth Billy runs? by Anonymous Coward · · Score: 0
    1. Re:Big mouth Billy runs? by Antisyzygy · · Score: 1

      I don't really understand what the fuck that link is supposed to prove other than you have been published. So what? I have been published before as well. Does being published mean you know what you are talking about? Nope. Logic flows from evidence and rational argument, and I see no evidence nor rational argument in your post.

      --
      That brings me to an interesting point, / . is just "the ramblings of socially-inept, technology-literate news-mongers".
  26. Completely wrong by Anonymous Coward · · Score: 0

    That is not how hyperthreading works at all. Practically all x86 chips since the original Pentium have out-of-order cores. With hyperthreading, each core has a certain amount of execution units and store buffers and so on, and it simply shares these resources two hardware threads. Instructions from both threads get decoded and allocated to the right type of execution unit as soon as one is available.

    Its not going to run as well as two completely dedicated cores, because the two hardware threads are competing for the available resources. Sometimes one of them will need an execution unit and it won't be available because the other one is using it, etc. Thats why one hardware thread may see a speedup if you don't schedule anything on the other hardware thread for that core.

    However, hyperthreading still a damn better deal than only having ONE hardware thread in that core, and letting the execution units sit there unused a greater percentage of the time. For the same amount of transistors, you can get more execution done in the same amount of time with hyperthreads.

  27. Re:It was already beating all intel in highly thre by Anonymous Coward · · Score: 1

    The bulldozer architecture is heavily optimized for highly threaded applications with a heavy reliance on integer operations. This is well represented by today's server workloads, not todays desktop applications. But more importantly for AMD's future this also represents the trending path of tomorrows applications.

    No it doesn't. AMD marketing would like you to believe every application is going to go massively parallel, but they're not the ones who actually have to write the software. It is not easy to thread all types of software (the low hanging fruit has already been picked), and it can be hard or impossible to get gains beyond two or three threads for many types of code.

    A great example of this is Battlefield 3, where the 8150 outperforms the i72600k.

    Er, what? The 2600k outperformed the 8150 when both were clocked at stock speeds. The 8150 won in the overclocked test.

    I shouldn't use the terms "outperformed" or "won", however, as this test was very sloppily done. Carefully read the description. Apparently the BF3 beta lacks facilities for repeatable benchmarking, so they just played on a live server with real players. This guarantees lots of noise and poor repeatability. And you can see that noise in the data: the BD average frame rate went up when overclocked, but the i5 and i7 averages went slightly down. That shouldn't ever happen in a CPU benchmark, not when you're raising the clock speed of the processor by over 1 GHz.

    In fact, note that all the average FPS results fall in the range ~50.5 to 54 regardless of CPU type and clock, and the overclocked i7-2600K loses to every non-OC result, including itself! This test isn't CPU limited in any way. It's just a fancy way of generating random numbers with no correlation to CPU performance.

    Unfortunately today this also means thatwhether Bulldozer or Sandybridge is faster today depends on the application. As from the above test we can almost assuredly guess that BF3 does more integer work, while Civ 5 does more floating point work.

    You're assuming games load up all cores. Few games use more than 2 or 3 cores, even recent releases. This is for two related reasons. First, it's hard to scale game logic to huge numbers of cores. Second, game developers know that it's only recently that new PC sales ticked over to a dual-core on average, much less quad or better, and they want to spend most of their time working on things which will benefit all their customers rather than putting out a lot of effort to help only the 5% or less who own high end hardware.

    (This is also why SLI/CrossFire has always been plagued by poor game support, and ATI & Nvidia have had to try various schemes to incentivize developers to spend time on multi-GPU.)

    Benchmarkers who managed to actually create CPU-limited gaming tests almost universally found that Sandy Bridge stomped all over Bulldozer. SB's per-thread performance is much better, which is a better match to today's (and tomorrow's) software.

    As an aside, as a server administrator today I would buy Bulldozer over Sandybridge based processors in a heart beat. Most of the "scale out" boxes such as web caches, database servers, etc., are highly multithread integer driven workloads. In this case bulldozer is going to destroy sandybridge, plain and simple.

    I think you're getting a bit ahead of yourself. Client BD certainly hasn't destroyed client SB, even in highly multithreaded integer workloads (it's more like it wins some, loses others), so what reason is there to believe anything will be different in servers? Keep in mind that Intel has yet to even release its high end SB CPU and platform (6-core/12-thread desktop, 6C/12T server, and 8C/16T server); all existing SB CPUs are mainstream desktop/notebook (4C/8T max with integrated gr

  28. Re:It was already beating all intel in highly thre by Anonymous Coward · · Score: 0

    If the case stated above is true it is possible that it is simple not realising the AMD CPU has the required SSE support because of a weak check.

  29. Re:It was already beating all intel in highly thre by Anonymous Coward · · Score: 0

    Scratch that, the guys who wrote the article already figured it out.

  30. AMD Tops Benchmark For 9 Months by Anonymous Coward · · Score: 0

    Benchmarks, benchmarketing, if you want some non-rigged results take a look at Geekbench. I see that an AMD system was top of the chart for over 9 months and after many thousands of submissions from Intel and IBM users: http://browse.geekbench.ca/geekbench2/top

  31. Re:It was already beating all intel in highly thre by hitmark · · Score: 1

    Sounds like sticking with GCC or some other neutral compiler is the best option.

    --
    comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
  32. Re:It was already beating all intel in highly thre by RightSaidFred99 · · Score: 1

    No.. Sticking with the compiler most likely to be used in the programs you care about is the best option. If that's GCC it's GCC, if it's ICC, it's ICC.

  33. Bulldozer is competitive with the right OS by Anonymous Coward · · Score: 0

    Phoronix has a set of benchmarks showing bulldozer is quite competitive when running under Linux.

  34. i don't use facebook by Anonymous Coward · · Score: 0

    "i dont give a flying fuck about how the win(d)blowz.
    i want a working paper weight with my bulldozer ... called fglrx.
    u can shove your chinese mass manufactured LED mouse up you HP printer-enabled
    keyboard (knock on wood) spooled windows powered internet.
    hear that AMD? you're a m$ subdivision just like adobe-flash-youtube"

  35. Re:It was already beating all intel in highly thre by Ecuador · · Score: 1

    Perhaps the Intel compiler situation might explain some of my experiences. I have a system (mac pro) with a quad core Bloomfield Xeon @ 3.2GHz and also a system with a quad-core Phenom II also @ 3.2GHz. On most things I run the Xeon is faster. However, the same is not true for the software I write. If I implement say the edit distance algorithm in C to compare two DNA molecules, and compile in x86-64 with gcc, the Phenom II is about 10% faster than the Xeon for a single thread. Interestingly, the Xeon is faster if compiled for 32bit arch. Then, a string processing program I have in Perl, runs at about the same speed on both CPU's per thread. The Xeon does get an advantage for more than 4 threads due to HT, but of course I could switch to the cheap Phenom X6 if I needed such workloads...
    Overall, for the custom software I use daily for work, the $5000 Xeon Mac Pro machine is not faster than the $500 Phenom II system...

    --
    Violence is the last refuge of the incompetent. Polar Scope Align for iOS
  36. Re:It was already beating all intel in highly thre by Ecuador · · Score: 1

    Interestingly, the Xeon is faster if compiled for 32bit arch.

    Obviously compared to the Phenom also running a 32bit binary.

    --
    Violence is the last refuge of the incompetent. Polar Scope Align for iOS
  37. Re:It was already beating all intel in highly thre by zbobet2012 · · Score: 1

    No it doesn't. AMD marketing would like you to believe every application is going to go massively parallel, but they're not the ones who actually have to write the software. It is not easy to thread all types of software (the low hanging fruit has already been picked), and it can be hard or impossible to get gains beyond two or three threads for many types of code.

    Yeah... No. Data dependencies are actually fairly low in almost every real world application. Highly threaded applications are rare outside of the enterprise (read Google/Amazon/transaction process etc.) world because they are modestly hard to write and there is a significant lack of developers experienced with making them. However, as more performance is required everyone is moving that way. Sure simple word editing may not require that many threads, but anything that needs performance will move that direction. It is not hype. Also battlefield 3 loads all cores of a i7-2600k to 50% on ultra.

    In fact, note that all the average FPS results fall in the range ~50.5 to 54 regardless of CPU type and clock, and the overclocked i7-2600K loses to every non-OC result, including itself! This test isn't CPU limited in any way ... You're assuming games load up all cores. Few games use more than 2 or 3 cores, even recent releases. This is for two related reasons. First, it's hard to scale game logic to huge numbers of cores. Second, game developers know that it's only recently that new PC sales ticked over to a dual-core on average, much less quad or better, and they want to spend most of their time working on things which will benefit all their customers rather than putting out a lot of effort to help only the 5% or less who own high end hardware.

    Games are not the only thing in the world. People who actually do computing as a profession, for example those of us who have to compile kernels on a regular bases, care about stuff like highly multithreaded integer based work loads. I also game so as long as I can get comparable performance in game it doesn't really matter to me. This will be the case for many people.

    I think you're getting a bit ahead of yourself. Client BD certainly hasn't destroyed client SB, even in highly multithreaded integer workloads (it's more like it wins some, loses others), so what reason is there to believe anything will be different in servers? Keep in mind that Intel has yet to even release its high end SB CPU and platform (6-core/12-thread desktop, 6C/12T server, and 8C/16T server); all existing SB CPUs are mainstream desktop/notebook (4C/8T max with integrated graphics). And the 8150 is the exact same chip AMD is going to be selling as a server CPU. (Unlike Intel since Nehalem, AMD tries to use the same chip design for server and client, because AMD doesn't have the volumes to justify taping out a different chip for each market.) Worse yet for AMD, every review I've seen where they measured power showed BD using a lot more juice, at least 50W more in every case (just spot checked one review and it was ~75W more load power than i7-2600K). That's not going to go over so well in the server space.

    The 8150 isn't the chip that AMD is offering for servers. AMD offers different chips for server/client, unless you seriously think the 12 core Magny-Cores is for clients? They have a 8core/16thread offering from the bulldozer architecture for servers too. The Intel SB server processors have been available to my company (admittedly a very large one) for quite some time as well. I am assuming you literally don't know what your talking about at this point on the server front.