The opposite is true. You can either get more performance by packing more transistors into the same space, or take advantage of lower resistance and get better power efficiency out of a smaller space.
Think about what you're saying; if die shrinks didn't improve energy usage, where a Prescott P4 with 125 million transistors had a TDP of 130W, a Sandy Bridge i7 with 995 million transistors would have a TDP of over a kilowatt.
But this (the drive for power efficiency over performance) is largely representative of market forces, not technology. Smartphones, for example, have a fixed power budget, unlike a desktop, and people want more and more powerful smartphones inside that budget. Before smartphones and tablets became so popular, there wasn't nearly as much pressure on the SoC market for this stuff.
There are theoretical limitations to how small things can get, and how much work can be done per unit of space, but we're nowhere near that yet.
The author claims that semiconductor density improvements have been slowing over the past few years, but that's not true at all. One need only look at past schedule of Intel's die shrinks, or their transistor counts, to realize that we're still going ahead at full steam. The pace of reductions has held pretty much constant to Moore's law for at least the past decade, and Intel's roadmaps seem to show that continuing for at least another two die shrinks (which will each double density).
It's kind of amazing, when you think of it. Comparing the best of 2002 to 2012, you get from 90nm to 22nm. In just one decade, that is a 16.7x increase in density, and that doesn't even take architectural improvements into account.
Not really, there hasn't been any real advancements in long-distance DSL since ADSL2's annex L (Re-ADSL2), which cranked up the power a bit to extend the range.
ADSL, ADSL2, and VDSL2 all behave about the same after a certain point. VDSL2 can do 250 mbps symmetrical at source, but after 1600m, it performs the same as ADSL2+, and eventually, the same as ADSL. All these newer DSL standards are really doing is crank up how much spectrum is used, enabling faster speeds at the distances short enough to be able to use those frequencies.
Vectoring helps eliminate crosstalk between multiple DSL lines sharing a bundle of wires, but probably won't make all that much of a difference on long loops, simple attenuation is the enemy there. The real answer to pushing DSL out farther is to push the DSLAM closer, and run fibre to the DSLAM.
Upload speeds on VDSL2 aren't automatically fixed like they are on ADSL. Different VDSL2 profiles use different splits. Personally, I've got 25 megs down, 7 megs up, and I've bonded two of those to get 50 megs down, 14 megs up.
The distance issue just requires them to push the remote DSLAMs closer to the customers. In my case, Bell Canada has installed the VDSL2 RDSLAM in the basement of my building. I'm 2400m from the CO, but only 45m from the closest DSLAM.
Bell's VDSL2 rollout also includes installing remotes in the basements of MDUs (apartment/condo buildings) to serve them and surrounding buildings. I'm 147 feet from my VDSL2 DSLAM.
It's an easy win; you install a small RDSLAM in the basement and run some fibre to it, and suddenly you can offer VDSL2 internet and HD IPTV to 250 new potential customers. And hey, if the neighbouring buildings are close enough, multiply that by a few times.
Well, how much of that refinement is actually useful for Intel's target use case here? They're going to stream this as compressed video to a tablet, antialiasing (which seems to be a large part of the refinement done in many of the nVidia demos) isn't that useful since it's all going to get crammed into a video stream anyhow. Looking at Intel's claims in terms of performance hits for various operations versus what I saw in the nVidia demos, it's clear that Intel has a better raytracing rendering engine, but it's not clear to me that that's not better software rather than better hardware. Say the Intel renderer used OpenCL (since Intel is pushing it too), allowing us to run the code on either Knights Ferry or Tesla cards. Then say I put eight high-end Tesla cards in a machine and set it loose on the renderer. Which solution will be better?
The nVidia solution can be purchased today, while the Intel solution is still in the R&D phase. That's my point, that it's not obvious how the Intel approach is better, and without comparative benchmarks or pricing, we can't say if the Intel solution is competitive.
I've used it... It's real time on my old GTX 285. The most fancy one, "Design Garage", gets 2-3 FPS. A modern nVidia card should be significantly faster, especially in SLI. But even in SLI, it'd still be enormously cheaper than Intel's 8-card solution.
The raytracing application is merely a demo. Real applications in the near term will take advantage of the fact that all of the cores on Intel's accelerator card are x86 compatible. The cores on a Nvidia graphics card are not x86 cores and probably never will be.
With lots of x86 cores you can do interesting things like implement drivers that make your multi-core accelerator card visible to your OS as if they were real CPU cores. Imagine that you have Chrome open with 100 tabs. Chrome runs each tab in a separate process. Your Intel accelerator card with 50-256 x86 cores could be used to run Chrome processes, one process per core. All of a sudden your main CPU is no longer bogged down running flash and background javascript crap for each of your 100 open tabs.
If there was any real benefit to this, we'd see dual-processor consumer motherboards; those died off in the Pentium II era. These days, with a modern quad-core processor, your "main CPU" is no longer bogged down with background javascript or running flash; that's already handled by different cores.
Over the long term, moores law suggests that these Intel x86 accelerator cards will have enough cores and fast enough cores to do graphics acceleration for games that is good enough and fast enough. Eventually, all of these multitudes of cores will come standard inside every Intel cpu, no "accelerator card" needed. Intel has done exactly this with current gpu technology on their current line of processors. Their graphics performance is sufficient for everyone except serious gamers.
Or, we'll continue to see the current progression of a steadily increasing number of fulls-sized cores, and Intel's lots-of-tiny-cores approach will be of little interest to anybody but HPC seekers.
The API is the same, so it just comes down to instruction set differences; Apple helped ease their transition from 68k to PPC with "fat" binaries, and then from PPC to x86 with "universal" binaries. Basically, you write your software so that it's compatible with both instruction sets (and since they're both little endian, this should be pretty trivial for most programs), and the compiler compiles for both instruction sets and bundles that into one binary.
As for emulating another architecture, well, again, that's what Apple did both times. Now, admittedly, they were transitioning each time, so they *had* to do something for backwards compatibility, but the concept is the same... Emulating a different instruction set when the API (and all the supporting DLLs) are identical is faster than emulating the entire operating system, because your emulated code is making calls out to native APIs which don't need to be emulated. That might not work well for something that's really performance-intensive like a game or a web browser, but there's a lot of apps out there that aren't performance intensive.
This may not be that big a deal for tablets, because while running Windows apps on a tablet might have been neat, it's not something a typical user will expect (depending on they market Win8 on tablets). But once they venture into smartbooks, customers will not understand why they can't run Windows software on their Windows notebook.
So what's the advantage here? The committed an eight-card 256 core server just to render a Quake 3 era game with raytracing. nVidia has been giving away (for free, as far as I can tell) a CUDA-based real-time raytracing engine for their CUDA cards (including Tesla) for a few years now, and before then, they had a final-frame renderer (non-real-time raytracer) available that predates CUDA.
If I can do with a $300-400 GPU in a $1000 computer what it takes Intel a massive custom-built server, what's the advantage of the Intel approach?
It would help if I had read the summary. They've got a single server with eight Knight Ferry cards, each having 32 cores. That's where they get their 256 cores from. And they're calling the single server a "cloud".
What makes this most unimpressive is that nVidia has been making a GPU-accelerated real-time raytracing engine for years now (you can even download working demos), and before that they were selling a GPU-accelerated final-frame renderer (non-real-time raytracing). Intel is showing off in-house demos of stuff running on expensive hardware, while nVidia has been giving away the same thing to customers for years, and it's something that's actually out there that you can use. Heck, so far as I can tell, it's free.
While this may be true in your particular case, many people are within the 1000 mile radius of an OnLive data center on a decent connection.
People talk a lot about how the network latency would make the input lag to OnLive unbearable, but consider this: 50ms of latency gets you from Montreal to Dallas (~2800km), and GTA IV on the XBox 360 has 133-200ms of input lag despite being local. In fact, every console game that Eurogamer measured had at least 67ms of latency, and they claim that the average seemed to be about 133ms. Gamers are clearly willing to accept this latency (GTA IV, with latency higher than OnLive in many cases, is clearly a very popular game), making OnLive seem much more practical.
I'm by no means close to an OnLive datacenter, or even in the same country (I live in Montreal, and the nearest OnLive datacenter is in D.C., if memory serves), and to me the latency would seem to be on par with a laggy console game. That is to say, not great, but no worse than I've seen with some console games. To me, the real issue with OnLive was the low bitrate; it looks OK (just OK) when there's no movement, but playing a match of UT3 on vehicles was unpleasant due to all detail being lost while in motion (and this on either a 50 meg down 14 meg up VDSL2 connection, or a 60 meg down 3 meg up cable line).
The good news is that it's a lot easier to increase the bitrate of a video stream than it is to break the speed of light;)
Sure, but a "core" in a GPU is far simpler than a "core" in a CPU, and Larrabee wasn't stripped down anywhere near that far. Larrabee was supposed to feature 32 cores in one package initially on a 45nm process, bumping it up to 48 on a later 32nm process. Intel is still on a 32nm process, so when they talk about a "256-core cluster", they're almost certainly talking about multiple systems; an 8-chip 32-core-per-chip system (or 4-chip 64-core) would not be a "cluster" in and of itself. And such a system does not sound cheap by any stretch of the imagination. Remember, Intel cancelled Larrabee because the performance, even with software rasterization, wasn't remotely competitive with modern GPUs, and software rasterization would be a heck of a lot faster than software ray tracing!
I think a lot of the work you put into running your own cluster will still be required for EC2 or other cloud providers. Cloud providers give you a VPS to play with, but they don't typically handle any of the cluster part. EC2 doesn't. So in either case, all of the software side of things is still up to you. In fact, the only thing extra that EC2's "cluster" service really gets you is that they provide a 10Gbps interconnect between your cluster instances. The instances themselves are nothing special, just large, and at ~$1500 a month, expensive.
Building a bunch of cheap machines and plugging them into a switch isn't difficult or risky, and outsourcing that to a cloud provider doesn't necessarily make it any easier. Potentially cheaper, depending on how long-term you need the performance. 4000 GBP would buy you 228162 hours of small (512MB RAM) instances at Linode, or 5690 hours of large (20GB of RAM) instances (cost scales linearly by RAM and guaranteed CPU share). Due to the way such cloud providers work (larger instances guaranteed larger minimum CPU, but all instance sizes have the same maximum theoretical performance of one quad core xeon), you'll probably get far more CPU power out of the many smaller instances, but it depends on how parallelizable the task is.
The downside of EC2 is that, while they guarantee a given amount of CPU power, the guaranteed amount is very small per dollar, and there's no taking advantage of spare CPU time that other people aren't using. In the case of a good cloud provider, there's usually a lot of CPU power to go around. If you want to directly compare guarantees, a large EC2 instance ($0.34/h) is probably roughly equivalent in guaranteed CPU to a 4GB linode ($0.22/h), but has double the burstable CPU power if it's available. Of course, RAM is not comparable, but it's unclear if RAM or CPU power is the primary demand in this instance. There are other complexities, because EC2 charges for storage and transfer on top of the base rate, while Linode includes it in the base rate.
If you're going to compare it to a consumer processor, I'd point out that a $300 sandy bridge chip is 20-30% faster than an i7 960, and if you're saying that a 6168 is the same speed indivdually as an i7 960, the sandy bridge chip costs less than half as much to boot.
Of course, there aren't any multi-processor Sandy Bridge Xeons on the market yet, so you can't put four of them in one machine.
You can't fill a rack with them, but the most bang-for-buck, if we're ignoring GPUs (and Tesla) is probably going to be consumer hardware.
For just under $500 CAD before shipping/tax, you can build a respectable barebones desktop machine at newegg, with an i7 2600, 8 gigs of RAM, and a 500GB HDD. That's probably the fastest consumer CPU on the market, too (the previous-gen hex-cores might edge it out). For GBP 4000, I can build 12 of them, with a few hundred left over for a switch and some network cables.
Now, there are faster *enterprise* CPUs on the market, to be sure. Intel has some eight and ten core Westmere Xeons... but they cost so much, you might only be able to build one or two systems for GBP 4000, and the twelve consumer machines would destroy it in terms of pure number crunching power. The question becomes where the balance between performance and reliability will be.
How well do iPhones and iPads display SVG animation?
Not likely well enough. I tried Google's Swiffy to convert a Homestar Runner cartoon to HTML5 (which uses SVG for the animation). It worked surprisingly well, except for the lack of audio support, the text in the cartoon appearing line-by-line rather than appearing behind the sbemail cursor, and the edges of shapes not quite lining up the same way as the original flash (creating borders where shapes overlap or don't quite perfectly). Performance on the desktop was, while I didn't look at CPU usage, the same as flash; playing them side by side they stayed in sync and the framerate was the same. However, on an iPhone 3GS, the framerate was rather poor. I don't believe the 3GS got the latest Safari javascript engine, though, so it may have worked fine on an iPhone 4 or iPad 2 (which has a much faster processor to boot).
Or contribute (iPhone VLC port) only to have your contribution buried by one of the authors on philosophical grounds, depriving users of choice.
The opposite is true. You can either get more performance by packing more transistors into the same space, or take advantage of lower resistance and get better power efficiency out of a smaller space.
Think about what you're saying; if die shrinks didn't improve energy usage, where a Prescott P4 with 125 million transistors had a TDP of 130W, a Sandy Bridge i7 with 995 million transistors would have a TDP of over a kilowatt.
But this (the drive for power efficiency over performance) is largely representative of market forces, not technology. Smartphones, for example, have a fixed power budget, unlike a desktop, and people want more and more powerful smartphones inside that budget. Before smartphones and tablets became so popular, there wasn't nearly as much pressure on the SoC market for this stuff.
There are theoretical limitations to how small things can get, and how much work can be done per unit of space, but we're nowhere near that yet.
The author claims that semiconductor density improvements have been slowing over the past few years, but that's not true at all. One need only look at past schedule of Intel's die shrinks, or their transistor counts, to realize that we're still going ahead at full steam. The pace of reductions has held pretty much constant to Moore's law for at least the past decade, and Intel's roadmaps seem to show that continuing for at least another two die shrinks (which will each double density).
It's kind of amazing, when you think of it. Comparing the best of 2002 to 2012, you get from 90nm to 22nm. In just one decade, that is a 16.7x increase in density, and that doesn't even take architectural improvements into account.
How can your computer's processor execute multiple instructions in less than a billionth of a second?
Not really, there hasn't been any real advancements in long-distance DSL since ADSL2's annex L (Re-ADSL2), which cranked up the power a bit to extend the range.
ADSL, ADSL2, and VDSL2 all behave about the same after a certain point. VDSL2 can do 250 mbps symmetrical at source, but after 1600m, it performs the same as ADSL2+, and eventually, the same as ADSL. All these newer DSL standards are really doing is crank up how much spectrum is used, enabling faster speeds at the distances short enough to be able to use those frequencies.
Vectoring helps eliminate crosstalk between multiple DSL lines sharing a bundle of wires, but probably won't make all that much of a difference on long loops, simple attenuation is the enemy there. The real answer to pushing DSL out farther is to push the DSLAM closer, and run fibre to the DSLAM.
Upload speeds on VDSL2 aren't automatically fixed like they are on ADSL. Different VDSL2 profiles use different splits. Personally, I've got 25 megs down, 7 megs up, and I've bonded two of those to get 50 megs down, 14 megs up.
The distance issue just requires them to push the remote DSLAMs closer to the customers. In my case, Bell Canada has installed the VDSL2 RDSLAM in the basement of my building. I'm 2400m from the CO, but only 45m from the closest DSLAM.
Bell's VDSL2 rollout also includes installing remotes in the basements of MDUs (apartment/condo buildings) to serve them and surrounding buildings. I'm 147 feet from my VDSL2 DSLAM.
It's an easy win; you install a small RDSLAM in the basement and run some fibre to it, and suddenly you can offer VDSL2 internet and HD IPTV to 250 new potential customers. And hey, if the neighbouring buildings are close enough, multiply that by a few times.
Well, how much of that refinement is actually useful for Intel's target use case here? They're going to stream this as compressed video to a tablet, antialiasing (which seems to be a large part of the refinement done in many of the nVidia demos) isn't that useful since it's all going to get crammed into a video stream anyhow. Looking at Intel's claims in terms of performance hits for various operations versus what I saw in the nVidia demos, it's clear that Intel has a better raytracing rendering engine, but it's not clear to me that that's not better software rather than better hardware. Say the Intel renderer used OpenCL (since Intel is pushing it too), allowing us to run the code on either Knights Ferry or Tesla cards. Then say I put eight high-end Tesla cards in a machine and set it loose on the renderer. Which solution will be better?
The nVidia solution can be purchased today, while the Intel solution is still in the R&D phase. That's my point, that it's not obvious how the Intel approach is better, and without comparative benchmarks or pricing, we can't say if the Intel solution is competitive.
I've used it... It's real time on my old GTX 285. The most fancy one, "Design Garage", gets 2-3 FPS. A modern nVidia card should be significantly faster, especially in SLI. But even in SLI, it'd still be enormously cheaper than Intel's 8-card solution.
The raytracing application is merely a demo. Real applications in the near term will take advantage of the fact that all of the cores on Intel's accelerator card are x86 compatible. The cores on a Nvidia graphics card are not x86 cores and probably never will be.
With lots of x86 cores you can do interesting things like implement drivers that make your multi-core accelerator card visible to your OS as if they were real CPU cores. Imagine that you have Chrome open with 100 tabs. Chrome runs each tab in a separate process. Your Intel accelerator card with 50-256 x86 cores could be used to run Chrome processes, one process per core. All of a sudden your main CPU is no longer bogged down running flash and background javascript crap for each of your 100 open tabs.
If there was any real benefit to this, we'd see dual-processor consumer motherboards; those died off in the Pentium II era. These days, with a modern quad-core processor, your "main CPU" is no longer bogged down with background javascript or running flash; that's already handled by different cores.
Over the long term, moores law suggests that these Intel x86 accelerator cards will have enough cores and fast enough cores to do graphics acceleration for games that is good enough and fast enough. Eventually, all of these multitudes of cores will come standard inside every Intel cpu, no "accelerator card" needed. Intel has done exactly this with current gpu technology on their current line of processors. Their graphics performance is sufficient for everyone except serious gamers.
Or, we'll continue to see the current progression of a steadily increasing number of fulls-sized cores, and Intel's lots-of-tiny-cores approach will be of little interest to anybody but HPC seekers.
The API is the same, so it just comes down to instruction set differences; Apple helped ease their transition from 68k to PPC with "fat" binaries, and then from PPC to x86 with "universal" binaries. Basically, you write your software so that it's compatible with both instruction sets (and since they're both little endian, this should be pretty trivial for most programs), and the compiler compiles for both instruction sets and bundles that into one binary.
As for emulating another architecture, well, again, that's what Apple did both times. Now, admittedly, they were transitioning each time, so they *had* to do something for backwards compatibility, but the concept is the same... Emulating a different instruction set when the API (and all the supporting DLLs) are identical is faster than emulating the entire operating system, because your emulated code is making calls out to native APIs which don't need to be emulated. That might not work well for something that's really performance-intensive like a game or a web browser, but there's a lot of apps out there that aren't performance intensive.
This may not be that big a deal for tablets, because while running Windows apps on a tablet might have been neat, it's not something a typical user will expect (depending on they market Win8 on tablets). But once they venture into smartbooks, customers will not understand why they can't run Windows software on their Windows notebook.
Right, but my point is that you don't need to wait five or ten years, you can buy a $400 graphics card that will do the same thing today.
So what's the advantage here? The committed an eight-card 256 core server just to render a Quake 3 era game with raytracing. nVidia has been giving away (for free, as far as I can tell) a CUDA-based real-time raytracing engine for their CUDA cards (including Tesla) for a few years now, and before then, they had a final-frame renderer (non-real-time raytracer) available that predates CUDA.
If I can do with a $300-400 GPU in a $1000 computer what it takes Intel a massive custom-built server, what's the advantage of the Intel approach?
It would help if I had read the summary. They've got a single server with eight Knight Ferry cards, each having 32 cores. That's where they get their 256 cores from. And they're calling the single server a "cloud".
What makes this most unimpressive is that nVidia has been making a GPU-accelerated real-time raytracing engine for years now (you can even download working demos), and before that they were selling a GPU-accelerated final-frame renderer (non-real-time raytracing). Intel is showing off in-house demos of stuff running on expensive hardware, while nVidia has been giving away the same thing to customers for years, and it's something that's actually out there that you can use. Heck, so far as I can tell, it's free.
While this may be true in your particular case, many people are within the 1000 mile radius of an OnLive data center on a decent connection.
People talk a lot about how the network latency would make the input lag to OnLive unbearable, but consider this: 50ms of latency gets you from Montreal to Dallas (~2800km), and GTA IV on the XBox 360 has 133-200ms of input lag despite being local. In fact, every console game that Eurogamer measured had at least 67ms of latency, and they claim that the average seemed to be about 133ms. Gamers are clearly willing to accept this latency (GTA IV, with latency higher than OnLive in many cases, is clearly a very popular game), making OnLive seem much more practical.
I'm by no means close to an OnLive datacenter, or even in the same country (I live in Montreal, and the nearest OnLive datacenter is in D.C., if memory serves), and to me the latency would seem to be on par with a laggy console game. That is to say, not great, but no worse than I've seen with some console games. To me, the real issue with OnLive was the low bitrate; it looks OK (just OK) when there's no movement, but playing a match of UT3 on vehicles was unpleasant due to all detail being lost while in motion (and this on either a 50 meg down 14 meg up VDSL2 connection, or a 60 meg down 3 meg up cable line).
The good news is that it's a lot easier to increase the bitrate of a video stream than it is to break the speed of light ;)
Sure, but a "core" in a GPU is far simpler than a "core" in a CPU, and Larrabee wasn't stripped down anywhere near that far. Larrabee was supposed to feature 32 cores in one package initially on a 45nm process, bumping it up to 48 on a later 32nm process. Intel is still on a 32nm process, so when they talk about a "256-core cluster", they're almost certainly talking about multiple systems; an 8-chip 32-core-per-chip system (or 4-chip 64-core) would not be a "cluster" in and of itself. And such a system does not sound cheap by any stretch of the imagination. Remember, Intel cancelled Larrabee because the performance, even with software rasterization, wasn't remotely competitive with modern GPUs, and software rasterization would be a heck of a lot faster than software ray tracing!
OnLive made it work with acceptable latencies, but then they did it with a cheap GPU and not a 256-processor cluster.
So, in other words "OnLive but with a software raytracer on the server-side instead of a GPU."
I think a lot of the work you put into running your own cluster will still be required for EC2 or other cloud providers. Cloud providers give you a VPS to play with, but they don't typically handle any of the cluster part. EC2 doesn't. So in either case, all of the software side of things is still up to you. In fact, the only thing extra that EC2's "cluster" service really gets you is that they provide a 10Gbps interconnect between your cluster instances. The instances themselves are nothing special, just large, and at ~$1500 a month, expensive.
Building a bunch of cheap machines and plugging them into a switch isn't difficult or risky, and outsourcing that to a cloud provider doesn't necessarily make it any easier. Potentially cheaper, depending on how long-term you need the performance. 4000 GBP would buy you 228162 hours of small (512MB RAM) instances at Linode, or 5690 hours of large (20GB of RAM) instances (cost scales linearly by RAM and guaranteed CPU share). Due to the way such cloud providers work (larger instances guaranteed larger minimum CPU, but all instance sizes have the same maximum theoretical performance of one quad core xeon), you'll probably get far more CPU power out of the many smaller instances, but it depends on how parallelizable the task is.
The downside of EC2 is that, while they guarantee a given amount of CPU power, the guaranteed amount is very small per dollar, and there's no taking advantage of spare CPU time that other people aren't using. In the case of a good cloud provider, there's usually a lot of CPU power to go around. If you want to directly compare guarantees, a large EC2 instance ($0.34/h) is probably roughly equivalent in guaranteed CPU to a 4GB linode ($0.22/h), but has double the burstable CPU power if it's available. Of course, RAM is not comparable, but it's unclear if RAM or CPU power is the primary demand in this instance. There are other complexities, because EC2 charges for storage and transfer on top of the base rate, while Linode includes it in the base rate.
If you're going to compare it to a consumer processor, I'd point out that a $300 sandy bridge chip is 20-30% faster than an i7 960, and if you're saying that a 6168 is the same speed indivdually as an i7 960, the sandy bridge chip costs less than half as much to boot.
Of course, there aren't any multi-processor Sandy Bridge Xeons on the market yet, so you can't put four of them in one machine.
EC2 is really expensive compared to other (better) cloud providers, not running your own cluster.
You can't fill a rack with them, but the most bang-for-buck, if we're ignoring GPUs (and Tesla) is probably going to be consumer hardware.
For just under $500 CAD before shipping/tax, you can build a respectable barebones desktop machine at newegg, with an i7 2600, 8 gigs of RAM, and a 500GB HDD. That's probably the fastest consumer CPU on the market, too (the previous-gen hex-cores might edge it out). For GBP 4000, I can build 12 of them, with a few hundred left over for a switch and some network cables.
Now, there are faster *enterprise* CPUs on the market, to be sure. Intel has some eight and ten core Westmere Xeons... but they cost so much, you might only be able to build one or two systems for GBP 4000, and the twelve consumer machines would destroy it in terms of pure number crunching power. The question becomes where the balance between performance and reliability will be.
How well do iPhones and iPads display SVG animation?
Not likely well enough. I tried Google's Swiffy to convert a Homestar Runner cartoon to HTML5 (which uses SVG for the animation). It worked surprisingly well, except for the lack of audio support, the text in the cartoon appearing line-by-line rather than appearing behind the sbemail cursor, and the edges of shapes not quite lining up the same way as the original flash (creating borders where shapes overlap or don't quite perfectly). Performance on the desktop was, while I didn't look at CPU usage, the same as flash; playing them side by side they stayed in sync and the framerate was the same. However, on an iPhone 3GS, the framerate was rather poor. I don't believe the 3GS got the latest Safari javascript engine, though, so it may have worked fine on an iPhone 4 or iPad 2 (which has a much faster processor to boot).
Note that Metro interface is designed mostly for tablets and as a simplistic interface for casual users.
Good thing Metro is mandatory and unavoidable on desktops too, then.