Check out Peakstream (http://www.peakstreaminc.com/). They're a Silicon Valley startup doing a lot of tool development for multicore chips, GPUs and Cell.
The specs on this board are pretty crazy. 128 single precision FP units each capable of doing a FP Multiply add or a multiply and operating at 1.35 GHz and no longer closely coupled to the tradition graphics pipeline. The memory hierarchy also looks interesting... this design is going to be seeing a lot of comparisons to the Cell processor. Memory is attached via a 384 bit bus (320 on the GTS) and operates at 900MHz.
The addition of a C compiler, drivers specific to GPGPU applications and available for linux (!) as well as XP/Vista means that this is going to be seeing widespread adoption amongst the HPC crowd. There probably won't be any papers on it published at SC06 in Florida next week, but over the next year there probably will be a veritable torrent of publications (there already is a LOT being done with GPUs). The new architecture really promotes GPGPU apps, and the potential performance/$ especially factoring in the development time which should be significantly less with this toolchain. A couple 8800GTXes in SLI and I could be giving traditional clusters a run for their money when it comes to apps like FFTs etc. I can't wait till someone benchmarks FFT performance using CUDA. If anyone finds such numbers post and let me know!
Hey,
The color of my skin is light brown. Just a couple weeks ago in a line of around forty others (none of whom were brown or black), I was the only person who was asked to step aside and informed that I had been 'randomly' selected for extra screening. I was then tapped down and felt up, my stuff gone through and I had to answer a bunch of questions.
The use of the word 'random' really makes me sick.
The "down and dirty" comment was a comparison vs. PCs. The DSP thing was meant as an alternative application of Cell. And Itanium IS amazing. Naffzigers stuff is all over new VLSI texts. The work done on power consumption alone is pretty amazing in Montecito (100W consumption in a billion plus transistor IC). Quoting Itanium as a failure is more a marketing/economic/portability thing than anything wrong with the underlying technology.
It's a beautiful processor... in spite of being inorder. I think it will work in games once developers figure out how utilize the architecture properly. Again, so much of how successful this platform is depends entirely on the compiler and the quality of tools provided. The thing about having Cell in a game console is that it allows developers a chance to really get down and dirty with a platform, something a PC developer never really gets to do. We won't see the 'movie-quality' games anytime soon... realistically it will be a year or two minimum, but there is the power there to do some impressive stuff.
Really tho'... I think it will be best in high-performance application specific stuff... maybe replacing DSPs/FPGAs?
You're absolutely right. The migration will be far slower than most people expect and the resolution of the format war will probably take a looong time if left up to consumption numbers. However, if displays evolve (think cheap 100" screens using OLED? or holography (probably still very far away)) to the point where there is a large difference in quality then there will be wider adoption.
I'm currently working with floating point accumulation and I've come to realize that rounding is unbelievably important when it comes to floating point. For long accumulations or a series of operations you need round to nearest functionality, but even this can be insufficient depending on the nature of the numbers your adding. If truncation is used however, although the easiest to implement in hardware, the error can add up so fast that it'll stun you.
It's good to see a fairly comprehensive summary of techniques out there that doesn't require wading through the literature.
TRIPS definitely doesn't look like it was targeted for a desktop, more for DSP like apps requiring high throughput and a constant data input stream. They mention this in the article, (Software defined radio, co-processors for actual general purpose processors). So an architecture like this may be competitive w/ something like Cell or Imagine or maybe TI's high end dsps, but not with a single core processor targeted for business apps and the like on x86 platforms. And part of the reason why TRIPS is a good design is that the compiler guys and the hardware guys are part of the same group and probably sat down and hammered out an ISA that allows for maximum extraction of parallelism by the compiler.
Btw the main reason why we're still using x86 is economics. It would require not just a better design to get companies to suddenly move on and abandon their legacy stuff, it would require something revolutionary with insanely good marketing. The drift to RISC type ISAs is happening... just very very slowly (I believe both AMD and intel convert x86 CISC type instructions into RISC-like uOPs which are then executed no?)
I don't believe that the margins are worthwhile at all on the lower end chipsets. Sis/AMD/VIA provide really stiff competition in that arena... Its a sensible move on Intels part
"Visit the engineering building at any university in this country and you wont even find anyone who speaks english -- it's all exactly Chinese and Indian people receiving stipends in addition to free tuition courtesy of US govt grants and then head back to their own countries, contributing nothing to tech in the very country that paid for their education."
You're right in that it is mostly people of other nationalities, however, remember that
a) they are required to pay far more in tuition fees than US nationals b) the buying power of their currency means that they are (relatively) paying even more. This is why stipends are necessary in graduate work if you want to attract the best and brightest. c) the research and publications they produce contribute to the US tech industry. d) Many of them CANNOT stay because of stringent immigration requirements
The rapid development of China and India is also partly due to the initial lack of infrastructure. This allows a 'leapfrogging' effect in which they bypass one generation of infrastructure and move to the newest. (Look at the adoption patterns of 3G) This means that they will be slower to adopt the next generation of technology whereas the US and similarly developed countries may be better positioned.
I've had only one CD drive crap out on me and that I purchased in '95 (4X.. cost me like 200+). Generally, when I find things don't play, it's usually the media rather than the drive. My 8x burner is still running (got it around 00) as is my DVDROM (2001).
Another problem (not from the security side) is the huge amounts of processing power required per IPSec tunnel. IPSec is a pretty heavy protocol, depending on the encryption used. If you intend on setting up a WLAN anywhere near the size and capacity of a normal wired LAN or even just on the basis of 1 MBp of bandwidth available per person, in an office of 50 people that means you need a server capable of handling 50MBps of traffic. The netscreen 50 offers a max of 50 MBps of 3DES encrypted traffic (you'll never reach that capacity on the box in RL) but costs between six and seven thousand. I doubt the average linux box could handle much without being very buffed up. Makes MUCH more sense to go with a product like Aegis from Meetinghouse that supports 802.1x based TLS, TTLS and LEAP. Also their server product runs on Linux.
So you're going to turn down a student with six terms worth of practical work experience in a variety of companies and actually has proven ability to work in the real world, and instead you're going to hire a student fresh out of college whos only work experience is in a McDonalds in high school? Get real, in spite of all the accusations that Waterloo pimps its students, they come out with a far better looking resume that any other college grads. And ECE150 isn't the only course that uses C/C++.
3G is a bit misleading I think. CDMA 1x is really more of a 2.5G technology. Sprint is following the CDMA 2000 evolution path, from 1xRTT to EV-DO and then EV-DV. Eventually CDMA will use OFDM (like 802.11a WLANs) over three channels to achieve 2+Mbps downstream but that will only begin to happen in 2004/2005. I think GSM->GPRS->EDGE->UMTS evolution path will probably be used by more telecoms worldwide.
This news is good for the telecom industry. With several countries scaling back their spending on 3G, the day when i'll be playing multiplayer Doom3 on my cellphone seems even further away:(
I'm afraid i explained that badly. I meant a switchbox like this:
http://www.mycableshop.com/sku/DSS%2045X.htm
they're waayyy too ghetto to do mess with ethernet data. (around $40 CND). Using one of these is the equivalent of unplugging the ethernet cable in the back and plugging in a new cable.
Take n number of the KVM extenders over cat 5. Plug into ethernet switch (I don't mean the Cisco type of switch. The kind with a switch in front for selecting from n number of inputs. They make them for parallel ports too so you can easily switch between printers). Plug however many devices into switch. In the single output cable, attach other end of KVM cat5 extender and then run cable to terminal.
It would be awesome if this worked, but it's possible that the switching device would mangle things up or reduce quality etc. Not quite KVM over IP but its really really cheap if you can get it to work
Luck
Check out Peakstream (http://www.peakstreaminc.com/). They're a Silicon Valley startup doing a lot of tool development for multicore chips, GPUs and Cell.
The addition of a C compiler, drivers specific to GPGPU applications and available for linux (!) as well as XP/Vista means that this is going to be seeing widespread adoption amongst the HPC crowd. There probably won't be any papers on it published at SC06 in Florida next week, but over the next year there probably will be a veritable torrent of publications (there already is a LOT being done with GPUs). The new architecture really promotes GPGPU apps, and the potential performance/$ especially factoring in the development time which should be significantly less with this toolchain. A couple 8800GTXes in SLI and I could be giving traditional clusters a run for their money when it comes to apps like FFTs etc. I can't wait till someone benchmarks FFT performance using CUDA. If anyone finds such numbers post and let me know!
Hey,
The color of my skin is light brown. Just a couple weeks ago in a line of around forty others (none of whom were brown or black), I was the only person who was asked to step aside and informed that I had been 'randomly' selected for extra screening. I was then tapped down and felt up, my stuff gone through and I had to answer a bunch of questions.
The use of the word 'random' really makes me sick.
The "down and dirty" comment was a comparison vs. PCs. The DSP thing was meant as an alternative application of Cell. And Itanium IS amazing. Naffzigers stuff is all over new VLSI texts. The work done on power consumption alone is pretty amazing in Montecito (100W consumption in a billion plus transistor IC). Quoting Itanium as a failure is more a marketing/economic/portability thing than anything wrong with the underlying technology.
It's a beautiful processor... in spite of being inorder. I think it will work in games once developers figure out how utilize the architecture properly. Again, so much of how successful this platform is depends entirely on the compiler and the quality of tools provided. The thing about having Cell in a game console is that it allows developers a chance to really get down and dirty with a platform, something a PC developer never really gets to do. We won't see the 'movie-quality' games anytime soon... realistically it will be a year or two minimum, but there is the power there to do some impressive stuff. Really tho'... I think it will be best in high-performance application specific stuff... maybe replacing DSPs/FPGAs?
You're absolutely right. The migration will be far slower than most people expect and the resolution of the format war will probably take a looong time if left up to consumption numbers. However, if displays evolve (think cheap 100" screens using OLED? or holography (probably still very far away)) to the point where there is a large difference in quality then there will be wider adoption.
I'm currently working with floating point accumulation and I've come to realize that rounding is unbelievably important when it comes to floating point. For long accumulations or a series of operations you need round to nearest functionality, but even this can be insufficient depending on the nature of the numbers your adding. If truncation is used however, although the easiest to implement in hardware, the error can add up so fast that it'll stun you. It's good to see a fairly comprehensive summary of techniques out there that doesn't require wading through the literature.
I'm selling my deranged rantings for 1 million dollars on ebay! *does the Dr. Evil pinky finger thing*
TRIPS definitely doesn't look like it was targeted for a desktop, more for DSP like apps requiring high throughput and a constant data input stream. They mention this in the article, (Software defined radio, co-processors for actual general purpose processors). So an architecture like this may be competitive w/ something like Cell or Imagine or maybe TI's high end dsps, but not with a single core processor targeted for business apps and the like on x86 platforms. And part of the reason why TRIPS is a good design is that the compiler guys and the hardware guys are part of the same group and probably sat down and hammered out an ISA that allows for maximum extraction of parallelism by the compiler. Btw the main reason why we're still using x86 is economics. It would require not just a better design to get companies to suddenly move on and abandon their legacy stuff, it would require something revolutionary with insanely good marketing. The drift to RISC type ISAs is happening... just very very slowly (I believe both AMD and intel convert x86 CISC type instructions into RISC-like uOPs which are then executed no?)
I don't believe that the margins are worthwhile at all on the lower end chipsets. Sis/AMD/VIA provide really stiff competition in that arena... Its a sensible move on Intels part
"Visit the engineering building at any university in this country and you wont even find anyone who speaks english -- it's all exactly Chinese and Indian people receiving stipends in addition to free tuition courtesy of US govt grants and then head back to their own countries, contributing nothing to tech in the very country that paid for their education."
You're right in that it is mostly people of other nationalities, however, remember that
a) they are required to pay far more in tuition fees than US nationals
b) the buying power of their currency means that they are (relatively) paying even more. This is why stipends are necessary in graduate work if you want to attract the best and brightest.
c) the research and publications they produce contribute to the US tech industry.
d) Many of them CANNOT stay because of stringent immigration requirements
The rapid development of China and India is also partly due to the initial lack of infrastructure. This allows a 'leapfrogging' effect in which they bypass one generation of infrastructure and move to the newest. (Look at the adoption patterns of 3G) This means that they will be slower to adopt the next generation of technology whereas the US and similarly developed countries may be better positioned.
I've had only one CD drive crap out on me and that I purchased in '95 (4X.. cost me like 200+). Generally, when I find things don't play, it's usually the media rather than the drive. My 8x burner is still running (got it around 00) as is my DVDROM (2001).
Another problem (not from the security side) is the huge amounts of processing power required per IPSec tunnel. IPSec is a pretty heavy protocol, depending on the encryption used. If you intend on setting up a WLAN anywhere near the size and capacity of a normal wired LAN or even just on the basis of 1 MBp of bandwidth available per person, in an office of 50 people that means you need a server capable of handling 50MBps of traffic.
The netscreen 50 offers a max of 50 MBps of 3DES encrypted traffic (you'll never reach that capacity on the box in RL) but costs between six and seven thousand. I doubt the average linux box could handle much without being very buffed up. Makes MUCH more sense to go with a product like Aegis from Meetinghouse that supports 802.1x based TLS, TTLS and LEAP. Also their server product runs on Linux.
So you're going to turn down a student with six terms worth of practical work experience in a variety of companies and actually has proven ability to work in the real world, and instead you're going to hire a student fresh out of college whos only work experience is in a McDonalds in high school? Get real, in spite of all the accusations that Waterloo pimps its students, they come out with a far better looking resume that any other college grads. And ECE150 isn't the only course that uses C/C++.
3G is a bit misleading I think. CDMA 1x is really more of a 2.5G technology. Sprint is following the CDMA 2000 evolution path, from 1xRTT to EV-DO and then EV-DV. Eventually CDMA will use OFDM (like 802.11a WLANs) over three channels to achieve 2+Mbps downstream but that will only begin to happen in 2004/2005. I think GSM->GPRS->EDGE->UMTS evolution path will probably be used by more telecoms worldwide.
:(
This news is good for the telecom industry. With several countries scaling back their spending on 3G, the day when i'll be playing multiplayer Doom3 on my cellphone seems even further away
I'm afraid i explained that badly. I meant a switchbox like this: http://www.mycableshop.com/sku/DSS%2045X.htm they're waayyy too ghetto to do mess with ethernet data. (around $40 CND). Using one of these is the equivalent of unplugging the ethernet cable in the back and plugging in a new cable.
Take n number of the KVM extenders over cat 5. Plug into ethernet switch (I don't mean the Cisco type of switch. The kind with a switch in front for selecting from n number of inputs. They make them for parallel ports too so you can easily switch between printers). Plug however many devices into switch. In the single output cable, attach other end of KVM cat5 extender and then run cable to terminal. It would be awesome if this worked, but it's possible that the switching device would mangle things up or reduce quality etc. Not quite KVM over IP but its really really cheap if you can get it to work Luck