Slashdot Mirror


HyperTransport 3.0 Ratified

Hack Jandy writes "The HyperTransport consortium just released the 3.0 specification of HyperTransport. The new specification allows for external HyperTransport interconnects, basically meaning you might plug your next generation Opteron into the equivalent of a USB port at the back of your computer. Among other things, the new specification also includes hot swap, on-the-fly reconfigurable HT links and also a hefty increase in bandwidth."

43 of 179 comments (clear)

  1. External HyperTransport? by DaHat · · Score: 3, Funny

    I can only imagine what that could do to us cheap bastards who have small clusters of older PC's sitting in a second bedroom or closet.

    "Hum... I can't quite afford a whole new system or even a motherboard and two new procs... I'll just add a new one to the back of an existing one"

    At last! The day of easily being upgrade to a multi-proc system may soon be at hand! (assuming they also have some sort of... external hub device).

    1. Re:External HyperTransport? by merreborn · · Score: 4, Insightful

      "Hum... I can't quite afford a whole new system or even a motherboard and two new procs... I'll just add a new one to the back of an existing one" ...Except you'd need a hypertransport 3.0 motherboard to begin with, and enough appropriately clocked RAM to make use of the processor. The whole "External CPU" idea was just speculation anyway; it's not mentioned anywhere in the article.

      Point being, you'll never be able to plug a new opteron into _anything_ that's sitting in your closet right now.

  2. Re:f*** by jamieswith · · Score: 2, Funny

    Somehow I doubt this will become available on hyper-transport 3...

    I really can't see it being that kind of socket!

    For now why dont you just stick with your 'Current Solution' and stop dreaming that you need all that extra 'Bandwidth'

  3. So the CPU will still be waiting for RAM? by Anonymous Coward · · Score: 2, Interesting

    Maybe they should integrate the RAM in to the CPU or something.

    1. Re:So the CPU will still be waiting for RAM? by DaHat · · Score: 2, Interesting

      Good point... but do you really want to dedicate a large chunk of ram to a specific processor in such a manor?

      Sure, with it there would be a possibility of cache coherency issues while without there would be a performance hit whenever something hit the bus...

      I guess it'd depend on the cost of ram when building such a device... I'm guessing that a whopping 64-128 meg cache aught to be enough for sometime.

    2. Re:So the CPU will still be waiting for RAM? by robthebob · · Score: 5, Funny

      Not to be pedantic, but while I might not want to dedicate a large chunk of ram to a specific processor in such a manor, I might want to live in that manor, and maybe have my serfs carry out the computations for me.

    3. Re:So the CPU will still be waiting for RAM? by stinerman · · Score: 2, Informative

      In a design class I took, our professor talked about something called "processor-in-RAM". The idea is that you'd have a few processors all with their dedicated RAM. The program you are running would be copied in each processors's RAM. When a branch was ready to be taken, half the processors would go one way and the other half the other. The processors that guessed right would let the other processors know they were wrong and update them with the new information. This way there is no penalty hit as all branches are correctly predicted.

      I'm guessing that a whopping 64-128 meg cache aught to be enough for sometime.

      Yeah, it'd provide some huge performance gain, but the shear cost of that much cache would easily be on the order of tens if not hundreds of thousands of dollars. Cache requires a few gates for each bit stored, while RAM uses gates to control capacitors (one capacitor for each bit).

    4. Re:So the CPU will still be waiting for RAM? by Anonymous Coward · · Score: 3, Funny

      I second that pedantry, and have aught but praise for the parent poster.

    5. Re:So the CPU will still be waiting for RAM? by smallfries · · Score: 3, Informative

      You're mixing up a few pieces of technology here. Processors with their own dedicated memory has been invented many times by different people. Modern loosely coupled clusters fit this bill, but further back there was the transputer systems in which each processor had memory on board. Systems like this are more difficult to program than single image systems (even with a CSP derivative as the language) but they produce higher performance.

      The other thing that you are describing is multiway branch prediction. A processor like the Pentium guesses which way a branch goes and despatching instructions down that path to the pipeline. When it is wrong there is a hit as the pipeline stalls and all of those cycles are lost. In multiway branching both outcomes of the branch are despatched to the pipeline. The cost is that half the instructions being executed will be thrown away. If you go 2 branches deep then it is 75%. The advantage is the latency is minimised as the pipeline is always full.

      The last thing is processor-in-RAM, or smart memory. In this system a miniture processor is embedded on the DRAM die. The small processor is capable of computing striding patterns in arrays. As the program executes on the main processor the smaller processor predicts which memory locations are going to be accessing and presending the data to the host processor, reducing latency.

      Good luck on your class. Architecture is one of the more interesting courses in a CS degree.

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    6. Re:So the CPU will still be waiting for RAM? by 10Ghz · · Score: 2, Interesting

      Instead of that, how about having some REALLY fast RAM right next to the CPU? Take a look at modern vid-card. Hi-end models have 256-512MB of uber-fast DDR3-RAM on 256bit bus. And the GPU's are usually bigger than CPU's are. And still, they can seel the entire package (GPU, card and RAM) for about $500. What if we did something similar with CPU's? Instead of selling CPU's as chips, sell them as modules (like SGI and Sun do). Attached to that module would be the CPU, and attached to the CPU would be 256-512MB of ~1GHz RAM on a 256 bit bus.

      And before you say that that is too little RAM.... Other CPU's in the system would have such RAM-setup as well. There could also be traditional memory-banks attached to the Nortbridge as well. So each CPU in the system would have 256-512MB of VERY fast RAM attached directly to the CPU. In addition to that, they could also access the RAM on other CPU's (like AMD64-machines do today). AND in addition to that, there would also be traditional memory-banks attached to the northbridge, for memory-expansion. The Northbridge-RAM would be shared with all the CPU's in the system (naturally).

      Of course, such a system would cost a bit more than current systems do. But it would have a metric assload of bandwidth. Would such system make any sense at all? Considering that vid-card makers can sell such RAM attached to relatively large GPU for around 500 bucks, why couldn't CPU-makers sell a smaller CPU with similar RAM for about same price?

      --
      Lesbian Nazi Hookers Abducted by UFOs and Forced Into Weight Loss Programs - -all next week on Town Talk.
  4. External FGPA units? by SaDan · · Score: 3, Interesting

    Hrm... Need a temporary boost in your folding at home project? Plug in an FPGA module!

    This can only be a good thing.

  5. A port? by Anonymous Coward · · Score: 2, Funny

    "you might plug your next generation Opteron into the equivalent of a USB port at the back of your computer"

    Is this a serial connection?
    Or will you need a foot wide port with 700 or so contacts on it?

    I know serial connections are very fast nowadays, but I don't know if you can get the entire memory bandwidth of a cpu without spreading the bandwidth in parallel connections.

    1. Re:A port? by Loconut1389 · · Score: 2, Interesting

      Check out the SGI/CrayLink setup used for ccNUMA - the port is around 2.5 inches, but has quite a lot of pins (maybe 100?). I don't think foot-wide is really necessary.

      IMHO, fiber optics- though delicate, could offer higher bandwidth. I'd rather have my whole fiber go dark from a break and know it than have one strand of many go out and not know it and have all kinds of whacky/intermittant behavior.

      I still struggle to understand why fiber optics are so expensive- the lasers used are fairly cheap and the cables really aren't that complex either and are made in enough quantity.. but I guess since it's not mainstream, it's expensive.

    2. Re:A port? by Anonymous Coward · · Score: 5, Informative

      The reason fiber optic (particularly glass core) is so expensive is due to the difficult and sensitive process required to manufacture that cable, though the materials used are extremely inexpensive. The diameter of the glass core must be matched exactly to the wavelength of light to travel over that fiber. In addition the composition and purity of the glass must meet certain standards to prevent reflection, signal attenuation, or signal skew, all of which would result in inconsistent or degraded performance. As far as the lasers being cheap, yes a laser can be cheap, but again the same demanding requirements apply to both versions of laser used in data communications, which again increases the manufacturing cost.

  6. Nice... by Frumious+Wombat · · Score: 2

    So, you take the external interconnects, a large SMP box, and a transfer rate unachievable by anything except channel-bonded Myri/Infiniband/Quadrics, and you've suddenly commoditized (is that a word?) the Origin 2K architecture. Unfortunately, there will be that inevitable gap between "announced" and "benchmarkable", but this should lead to interesting system design.

    Computing might just become fun again. Small systems passing information around to form a display wall, or big systems chained together to become huge systems.

    --
    the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
    1. Re:Nice... by questionlp · · Score: 3, Interesting

      Although HT 3.0 will be a very good step to bring the Opteron closer to the Origin architecture, but the Opteron still lacks or does not have good implementationse of the cache coherency and other caching features of NUMAlink used in the Origin servers/clusters. The Horus chipset helps in some ways, but doesn't help scaling beyond 8P in a glueless fashion.

      Just my $0.01

    2. Re:Nice... by civilizedINTENSITY · · Score: 2, Insightful

      Are you suggesting AMD buy SGI?

    3. Re:Nice... by ArbitraryConstant · · Score: 4, Funny

      "Are you suggesting AMD buy SGI?"

      Hell, I've got some change left over from lunch, I'm thinking of buying SGI.

      --
      I rarely criticize things I don't care about.
  7. Hmmmm. by ultramk · · Score: 4, Insightful

    I can see an interesting situation where you could have a traditional CPU, to which you could plug in additional external processor modules as your needs expand. (assuming the OS could handle sharing out multithreaded apps over a variety of different multi-CPU configurations.)

    Dave has a processor intensive project this week? He gets the big stack plugged into his machine until someone else in the office needs it.

    Server getting bogged down? Add another couple modules to the system.

    I like the idea.

    m-

    --
    You catch enchiladas by picking them up behind the head and holding them underwater until they don't kick anymore -VeGas
    1. Re:Hmmmm. by DaHat · · Score: 2, Interesting

      I was thinking something similar... there is one issue that no one here has thrown out yet. Heat.

      Lets say your company has a 4-way hub that can be plugged into the system of choice... imagine the cooling such a thing would require in order to keep from burning up in its enclosed plastic or (more likely) metal box.

      Not to mention the noise... oh good god the noise. My dual core 3800+ at home is quite loud... I can only imagine what a few of those bad boys sitting on your desk would sound like under full load.

      I suppose a good deal of issues could be eliminated if low power cpu's were to be used in such a manor... then you wont have as many issues drawing from the host PC (ie not necessarily having to have an external power supply).

    2. Re:Hmmmm. by masklinn · · Score: 3, Insightful

      My dual core 3800+ at home is quite loud...

      No it isn't you dummy, your cooling system is, now just get a knowledgeable friend to slap a Thermalright HR-01 and a Nexus 120mm fan (undervolted to 9V) on it and it'll be whisper-quiet.

      --
      "The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
  8. Re:f*** by eln · · Score: 5, Funny

    I really can't see it being that kind of socket!

    Oh I dunno, take it out to dinner, buy it a few drinks, you never know what could happen.

  9. Increased Bandwidth by Metabolife · · Score: 5, Informative

    HT 3.0 increases the bandwidth to 41.6 GB/s, that's 86% more than 2.0. It's also expected to be backwards compatible with current motherboards using 2.0. The new processor will run with 3.0 speeds while the motherboard will be stuck with 2.0. The new Rev. F AMD cpus are expected to have HT 3.0. It should help with multi-processor systems where the high bandwidth connects each cpu.

  10. NOT anything like USB at all. by Visaris · · Score: 5, Insightful

    Whoever subimtted the article doesn't understand what the external HT links are for. They are _NOT_ a replacement for USB or any other similar technology. External HT is used to link multiple chassis together to form a large SMP box. This is similar to infiniband, etc. This is NOT designed to be a way to just plug in a CPU to an external port. Read the pdf:

    http://www.hypertransport.org/docs/tech/ht30pres.p df

    --

    I am a viral sig. Please help me spread.
    1. Re:NOT anything like USB at all. by I+Like+Pudding · · Score: 5, Funny

      External HT is used to link multiple chassis together to form a large SMP box. This is similar to infiniband, etc.

      Oh, so it's like USB

    2. Re:NOT anything like USB at all. by cyngus · · Score: 3, Insightful

      The similarity they were referring to is the plug-and-play nature of USB. The external link capability combined with 3.0's hot swapping would allow you this same kind of flexibility. You completely missed the point of the analogy.

  11. In the meantime... by jd · · Score: 4, Interesting
    Broadcom's BCM1250 MIPS processor implements a totally non-standard HyperTransport that blends several of the early 1.x specifications in a way that is unpredictable and a pain. Yes, folks, there are manufacturers out there who don't debug or maintain their product lines, who won't stick to published specs, and who can't be relied upon to publish their own specs. Sometimes, those of us who post on Slashdot slam Intel for decisions that are nothing short of insane, but there are actually far far worse offenders out there.


    Most of the HyperTransport updates look to be good (and, frankly, about time) but I am highly concerned that if certain manufacturers (such as Broadcom) haven't even bothered to do better than a fragmentary 1.x and have ignored 2.x entirely, there is little hope that they'll do much with 3.x.


    And that's the big problem. If AMD are the only ones who ever implement the specification in full, correctly, then it doesn't offer any significant advantage. It isn't universal enough to be useful. That is the killer that has murdered so many excellent technologies. Being good - even being the best - isn't enough. If a rival is more widely adopted, then it'll be the rival that wins. The marketplace doesn't reward quality, it rewards popularity. Quality achieves nothing.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  12. Thanks alot by hurfy · · Score: 2, Funny

    Now half my brain will be trying to design a 939 connector USB cable in the background....

    hehe external CPU, someone got a better batch of something than i did.....

  13. Broadcom isn't the whole industry: by Visaris · · Score: 2, Interesting
    --

    I am a viral sig. Please help me spread.
  14. Re:x86 processors by Andrzej+Sawicki · · Score: 2, Insightful

    OMG Ponies?

    But seriously, you got it wrong. It's puke green, of course.

  15. Re:f*** by heinousjay · · Score: 2, Interesting

    The fact that you were modded flamebait makes me wonder which fool computerfucker got points today.

    --
    Slashdot - where whining about luck is the new way to make the world you want.
  16. So finally by iminplaya · · Score: 2, Funny

    We'll be able to go from New York to Tokyo in less than three hours?

    --
    What?
  17. Hypertransport is the wave of the future. by Inoshiro · · Score: 4, Informative

    Why are MacBook Pros so much faster than Powerbooks?

    The MacBook Pro sports a 666Mhz DDR FSB, while the Powerbook sports a 133Mhz FSB. It doesn't matter how fast your processor is if you don't have a fast enough way to power it (much like a V-12 will not do well with a single-barrel carb used on a lawnmower engine).

    The Von Neumann bottleneck is the significant limiting factor in all machines, once your working set of data exceeds that of your L1/L2 cache. Suddenly your 1.5 Ghz G4 is 266 Mhz :/

    Faster hypertransport means happier users of AMD machines. My AMD64 beats the pants off my Sempron 2500 because its 800Mhz HT bus allows it to do context switches in less than 1/3rd the time of the Sempron!

    --
    --
    Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
  18. Said before and said again... by zaguar · · Score: 2, Insightful
    I've said it before and I'll say it again - Open standards lead to better products. Case in point - Hypertransport. That story about the possibilities of fluid simulations/path finding in the oil industry opened up by co-processors slotting into HTT links is just a case in point.

    Hey Intel, hows the FSB? And, for that matter, how's that DRM-soaked Viiv product going?

    --
    "Sure there's porn and piracy on the Web but there's probably a downside too."
  19. Re:x86 processors by fitten · · Score: 4, Interesting

    Yup... It has always been thus. The difference is that the high-end processors do exotic things and then Intel/AMD suck it in when it is ready for commoditization. The x86 has *always* been behind in those types of technologies (but usually pretty far ahead in tricks to make the x86 ISA fast) because those technologies are high-end. Eventually, it all trickles down to commoditization and then we get it in x86s.

  20. Legos by Slayback · · Score: 2, Funny

    Just make all the components (memory, CPU, disks, interfaces) like Legos, and you'll be set. Need more RAM? Just add another block. Suzy needs some extra CPU for a big project, let her borrow your block for the day.

    The bonus feature would be collecting enough hardware to make the Millenium Falcon out of your PC.

  21. Re:x86 processors by Wdomburg · · Score: 4, Informative

    So fifteen years ago everyone else had 20GB/sec buses? Funny, Sun seems to think they were using MBus, which peaked at around 350-400MB/sec. And HP was dropping CPUs on a GSC bus running at ~ 250MB/sec. I'd look up what state of the art was for SGI and IBM, but it would be silly. AMD and Intel surpassed other chip vendors on a number of fronts years ago.

  22. Re:x86 processors by jacksonj04 · · Score: 3, Insightful

    Okay, let me explain about the difference between hardware and software. Processors and HyperTransport, and thus the subject of this discussion, are hardware related. Windows and Unix are software. Blabbering on about how Windows is the scourge of the world and we should all use vi/emacs/insert_editor_here when the parent was clearly talking about hardware with no association other than your own (Extremely weak, see other replys) point seems a bit... oh I don't know. OS Zealous?

    --
    How many people can read hex if only you and dead people can read hex?
  23. or... by YesIAmAScript · · Score: 2, Informative

    Perhaps it's because your Sempron 2500 is a socket 754 chip, so cannot use dual-channel memory. The AMD64 has a faster FSB, and it's dual-channel.

    Many people (including yourself it seems) misunderstand HT. It isn't the FSB, an Athlon 64 has no FSB. HT is only used to communicate non-memory I/O and to synchronize caches between processors when doing memory I/O. So it's rather unlikely that HT could make your context switches 3X faster. Best thing for that would be a bigger cache, which your AMD64 probably has also.

    --
    http://lkml.org/lkml/2005/8/20/95
  24. Re:not USB by EndlessNameless · · Score: 2, Interesting

    That's a nice idea and all, but it doesn't make a lot of sense architecturally, at least for general-purpose computing. HT is designed as a peripheral bus. Making a CPU be a peripheral to the main system... well, you could offload work onto it, I suppose, and it would have DMA access, but it would still be the ultimate third wheel---far enough out that memory accesses would be relatively slow, and it couldn't realistically share peripheral access, so all UI interaction and device access would pretty much have to be handled on the main CPU/GPU, so you end up bottlenecked by the main CPU for a lot of stuff anyway.

    Um... I hate to break this to you, but AMD-64 CPUs use Hypertransport links as their interconnect already. Which means the way you described it is exactly how it works. The 100-series Opterons have 1 HT link that goes to the system's peripheral devices and buses. The 200-series Opterons have 2 HT links: one connects it to the other CPU and the other connects to peripheral devices. I think you can guess how many links the 400- and 800-series Opterons have.

    The place where this would be really interesting, though, would be the whole "one bus to rule them all" space. You could use this to cheaply add external PCI slots without the relatively expensive hardware needed to send PCI more than a couple of inches (though this can also be solved using PCIe as the interconnect). You could use this to eventually supersede low performance busses like USB.

    This is how HT is used internally already. It connects the CPU to the other buses and system devices (the other end of the link is usually terminated by the southbridge ASIC. As far as clustering goes, a 1-meter link makes it somewhat doable, but rememeber that there are already high bandwidth external interconnects like Infiniband that are already in use. I didn't see anything in the article that suggested HT is capable of blowing the established technologies out of the water.

    --

    ---
    According to the latest ruleset, this post should be modded as Vorpal Flamebait +5.
  25. Re:I understand quite well. by ArbitraryConstant · · Score: 3, Informative

    "The bus connection between my CPU and the RAM is, indeed, the Hypertransport. Northbridge, CPU, and RAM are all connected by it."

    This is wrong. Athlon64s have an on-die memory controller. They communicate with memory directly through the dual-DDR memory bus, no intermediaries. This is what gives Athlons their famously low memory latency.

    In Athlon64s, the northbridge as we know it does not exist because the memory is connected directly to the CPU itself. The CPU is connected to the chipset by way of a hypertransport bus, and memory I/O for other devices goes over this bus to the CPU's memory controller.

    --
    I rarely criticize things I don't care about.
  26. Just avoid Broadcom by btarval · · Score: 2, Interesting
    Far be it from me to defend Broadcom (as no one in their right mind should choose the BCM1250), but the 1250 is an old, nearly unmaintained CPU. It was done about 6 years ago, when the HT spec was hardly off the ground. So, yes, it implements a non-standard version of HT; but the HT spec was still evolving.

    Instead of the harping on the implementation (which was done in a slapdash, amatuerish fashion by SiByte in order to make a quick buck - and screw the customer), you should blast Broadcom for basically dropping support for this CPU. Broadcom has done almost nothing whatsoever to improve the CPU. In fact, they go far out of their way to avoid the needed improvements. Witness the completely bogus (and nearly useless) JTAG support for the 1250.

    They used to have GDB support for it for free. That's all gone; and in fact no longer works with the new Rev C 1250's. Instead, you have nearly useless third-party support from Corelis and Greenhills.

    Forget source code debugging if you have a ClearCase SCM, unless you want to go through a bit of pain and hackery.

    And, hells bells, let's not talk about the memory controller, which is the worst one I've ever seen. If there were ever anything which needed improvement, it is that.

    In short, if you chose the BCM1250, you were an idiot and deserve what you got. No sane embedded person would do so. A clueless architect might, but not a real embedded engineer.

    I once had to inherit this mess; and I'm delighted to be done with it.

    So just avoid Broadcom altogether. They have an established track record of leaving you high and dry should you make the mistake of depending on them. And they just don't give a damn about their customers.

    --
    The best way to predict the future is to create it. - Peter Drucker.
  27. Re:x86 processors by Lucractius · · Score: 2, Informative

    guess what

    Cray use Hyper Transport now

    --
    XML - A clever joke would be here if /. didn't mangle tag brackets.