Slashdot Mirror


Errata Prompts Intel To Disable TSX In Haswell, Early Broadwell CPUs

Dr. Damage writes: The TSX instructions built into Intel's Haswell CPU cores haven't become widely used by everyday software just yet, but they promise to make certain types of multithreaded applications run much faster than they can today. Some of the savviest software developers are likely building TSX-enabled software right about now. Unfortunately, that work may have to come to a halt, thanks to a bug—or "errata," as Intel prefers to call them—in Haswell's TSX implementation that can cause critical software failures. To work around the problem, Intel will disable TSX via microcode in its current CPUs — and in early Broadwell processors, as well.

15 of 131 comments (clear)

  1. Not all that surprising... by K.+S.+Kyosuke · · Score: 5, Interesting

    So, basically, they've just been forced to get rid of the most complex (that's why it's not all that surprising) yet also most beneficial feature with regards to server loads? I'm sure there are some Opterons laughing right now.

    --
    Ezekiel 23:20
    1. Re:Not all that surprising... by gstoddart · · Score: 5, Funny

      What of the folks that purchased these chips for these specific instructions?

      Same as happens to all early adopters -- the feature may or may not work, and even if it does, there's no guarantee it will be supported (or the same) in the next version.

      This is a pretty big 'errata', which is an awesome marketing speak for "really bad QA".

      Engineers Release Really Awful Tech. Awesome!

      --
      Lost at C:>. Found at C.
    2. Re:Not all that surprising... by gman003 · · Score: 4, Informative

      I'm sure there are some Opterons laughing right now.

      Yes, but some of them take a while to get the joke because their TLB had to be disabled.

      (Certain releases of the "Barcelona" Opterons had a bug that could lock up the system. A workaround would prevent it, but had a stiff performance penalty. Later steppings had it fixed.)

    3. Re:Not all that surprising... by ShanghaiBill · · Score: 4, Informative

      See also Pentium 5 and the FDIV bug. It falls under "too bad, so sad, try your luck with the next revision".

      No. Intel offered to replace any P5 with the FDIV bug upon request. Most customers did not request a replacement, but the option was available.

    4. Re:Not all that surprising... by CajunArson · · Score: 5, Insightful

      Nobody has been robbed.
      TSX today works exactly as well as TSX worked yesterday, and considering that Haswell has been on the market for over 1 year, I assure you that anybody who has been chomping at the bit to use TSX has been using TSX.

      If the TSX erratum were trivially easy to trigger, then this article would have been posted last spring before Haswell even launched.

      Intel has done the responsible thing by acknowledging the bug (trust me son, AMD & Nvidia often don't bother with that part of the process) and giving developers the OPTION to either use TSX as-is or disable it to ensure that it cannot cause instability no matter what weird operating conditions can occur.

      Tell ya what, why don't you take all your nerd-rage over to AMD or ARM where they won't rob you of all kinds of advanced features that they just don't bother to implement at all.

      --
      AntiFA: An abbreviation for Anti First Amendment.
    5. Re:Not all that surprising... by EvilJoker · · Score: 4, Informative

      I know this was a troll, but I feel compelled to reply in case someone doesn't know.

      ALL CPUs have errata. Some of it more significant than others.

      A quick Google for "AMD errata" revealed Revision Guide for AMD Family 16h Models 00h-0Fh, published June 2013, and applying to AMD's Mobile A,E, and G series, and Opteron X1100/X2100 (These are modern CPUs)

      There are 21 entries, with descriptions, system impact, and suggested workaround (if any)

      Haswell's errata has 131 entries

    6. Re:Not all that surprising... by Anonymous Coward · · Score: 4, Informative

      See also Pentium 5 and the FDIV bug. It falls under "too bad, so sad, try your luck with the next revision".

      No. Intel offered to replace any P5 with the FDIV bug upon request. Most customers did not request a replacement, but the option was available.

      Not at first they didn't.

      My friend was doing his master on neural networks (?) at the time and some of his algorithms were giving back hinky results, especially when he compared them to some of the SPARC systems.

      He had to actually provide documentation that it effected him, and I think sign an NDA, before Intel would give him anything. He jumped through their hoops to get a replacement, and then the very next week Intel announced their carte blanche replacement program.

      It took much screaming in the industry before Intel became "generous".

    7. Re:Not all that surprising... by Sun · · Score: 3, Informative

      I have a firend who came to me, eyes all glowing, about this new feature his shining new CPU has. I listened in and was skeptical.

      He then tried, for over a month, to get this feature to produce better results than traditional synchronization methods. This included a lot of dead ends due to simple misunderstandings (try to debug your transation by adding prints: no good - a system call is guaranteed to cancel the transaction).

      We had, for example, a lot of hard times getting proper benchmarks for the feature. Most actual use cases include a relatively low contention rate. Producing a benchmark that will have low contention on the one hand, but allow you to actually test how efficient a synchronized algorhtm is on the other is not an easy task.

      After a lot of going back and forth, as well as some nagging to people at Intel (who, suprisingly, answered him), he came across the following conclusion (shared with others):
      Many times a traditional mutex will, actually, be faster. Other times, it might be possible to gain a few extra nanoseconds using transactions, but the speed difference is, by no means, mind blowing. Either way, the amount you pay in code complexity (i.e. bugs) and reduced abstraction hardly seems worth it.

      At least as it is implemented right now (but I, personally, fail to see how this changes in the future. Then again, I have been known to miss things in the past), the speed difference isn't going to be mind blowing.

      Shachar

    8. Re:Not all that surprising... by rrohbeck · · Score: 4, Informative

      Singular: Erratum
      Plural: Errata

    9. Re:Not all that surprising... by TheRaven64 · · Score: 3, Informative

      It depends a lot on the data structures. There were a number of papers using TSX at EuroSys this year. The main conclusion was that TSX lets you get similar performance from simple approaches as you can get already from complex approaches. For example, you can protect a long linked list in a single lock and use HLE to get a big speedup with lots of concurrent insertions and accesses, but you can achieve similar performance with a fine-grained locking scheme. There was a nice paper about Cuckoo hashing where they initially found that TSX gave them a performance win, but then were able to get a similar speedup without it.

      The big win with TSX is that it's pretty easy to reason about coarse-grained locking and much harder to reason about fine-grained locking. If you can make coarse-grained locking almost as fast as fine-grained, then that's a huge saving on testing and debugging time.

      --
      I am TheRaven on Soylent News
  2. a bug != errata by Ecuador · · Score: 3, Insightful

    You either say "bugs - or errata" or "a bug - or erratum", since bug is singular and errata plural. At least the error - or "erratum" (see what I did here) in this case was in TFA and not introduced in the /. summary.

    --
    Violence is the last refuge of the incompetent. Polar Scope Align for iOS
  3. Re:Well, we call them... by wonkey_monkey · · Score: 3, Funny

    It's okay, Intel are setting a new subdivision to undo these problems. And to maximise employee happiness, it's being built in the Canary Islands.

    I think I'd enjoy being a Featurata Reverter in Fuertaventura.

    --
    systemd is Roko's Basilisk.
  4. Re:So how does one find out /apply "fix" with linu by Anonymous Coward · · Score: 3, Informative

    Wikipedia has very detailed information on Intel processors. This page does not list TSX for your processor and does list it for others.

    Most Linux distros automatically handle Intel microcode patches (which I assume is how this errata will be handled). See Debian wiki or Arch wiki for details.

  5. Re:Bought a 4770 instead of 4770K because of TSX by CajunArson · · Score: 3, Informative

    You can still "play with this instruction" all you want.

    What happened here is that a third party developer managed to uncover a corner case where certain interactions with TSX can lead to instability. In order to be safe, Intel acknowledged the bug (a refreshing response) and is now giving you the OPTION to disable TSX if you feel that it could impinge the stability of a production load.

    So basically: Go ahead and play with TSX all you want, but be aware of the errata and that it's theoretically possible to hang your machine in some corner cases.

    --
    AntiFA: An abbreviation for Anti First Amendment.
  6. Problem and possible alternatives by enriquevagu · · Score: 5, Informative

    This is a real pity for the TM community. This is not the first chip with transactional memory support in hardware: The Sun Rock was announced to have hardware TM support, and the IBM Blue Gene/Q Compute chip also supports it. Unlike other proposals for unbounded transactional memory, all these systems employ Hybrid Transactional Memory (ref, ref, ref), in which restricted hardware transactions are designed to correctly coexist with unbounded software transactions, so a software transaction can be started in case a hardware transaction fails for some unavoidable issue (such as lack of cache size or associativity to hold speculative data from the transaction, not because of a conflict). Note that, in any case, very large transactions should arguably be very uncommon, since they would significantly reduce performance (similar to very large critical sections protected by locks).

    The problem with the hardware implementation of transactional memory is that they are not simply a new set of instructions which are independent from the rest of the processor. HTM implies multiple aspects, including multiversioning caching for speculative data; allowing for the commit of speculative (transactional) instructions, which could be later rolled back (note that in any other speculative operation such as instructions after branch prediction, the speculation is always resolved before instruction commits because the branch commits earlier); a tight integration with the coherence protocol (see LogTM-SE for an alternative to this very last issue, but still...); a mechanism to support atomic commits in presence of coherence invalidations... From the point of view of processor verification, this is a complete nightmare because these new "extensions" basically impact the complete processor pipeline and coherence protocol, and verifying that every single instruction and data structure behaves as expected in isolation does not guarantee that they will operate correctly in presence of multiple transactions (and non-transactional conflicting code) in multiple cores. There are some formal studies such as this or this, and the IBM people discuss the verification of their Blue Gene TM system in this paper (paywalled).

    As some others commented before, the nature of the "bug" has not been disclosed. However, since it seems to be easy to reproduce systematically, I would expect it to be related to incorrect speculative data handling in a single transaction (or something similar), rather than races between multiple transactions.

    Regarding the alternatives, Intel cannot simply remove these instructions opcodes because previous code would fail. I assume that the patch will make all hardware transactions fail on startup, with an specific error (EAX bit 1 indicates if the transaction can succeed on a retry; setting this flag to 0 should trigger a software transaction). In such case, execution continues at the fallback routine indicated in the XBEGIN instruction, which should begin a software transaction. Effectively, this will be similar to a software TM (STM) with additional overheads (starting the hardware transaction and aborting it; detecting conflicts with nonexistent hardware transactions) that would make it slower than a pure STM implementation.