If we get our 20 megabits/second delayed by 300ms because someone QoS'ed a phonecall higher than us; we wont mind.
Only two things cause latency, distance or congestion. Distance is limited by the speed of light and has nothing to do with your ISP. The other thing that affect latency is congestion. If you have high latency, you also have low bandwidth, because it means your route is "full". You can't have high latency high bandwidth unless the high latency is caused by physical limitations.
Let me know how your "parallel data planes" work on that serial connection. Some of these high end devices are pushing data faster than L1 cache on a 4ghz Intel CPU. There are limits. At the nanometer level, those extra paths cause delays, and those delays slow things down.
If the ISP wants to provide a service that prioritizes certain content over other content, that's their prerogative because it's their network and their service to provide as they see fit.
If the customer is paying for 100mb Internet and the ISP is actively limiting them to less than 5mb, that is fraud.
At some point in the past, I brought up a question in a physics forum used by college, graduate, and PHDs. I asked if a blackhole had a radius greater than 1 light second, and something fell past the event horizon, wouldn't the object be moving fast as light after 1 seconds given that acceleration at the even horizon is c/s^2? Several people responded saying it doesn't matter because it's not proper acceleration, so the same restrictions do not apply. Either I got massively trolled or it seems to be common acceptance that faster than light speeds can happen to space itself, just not to objects moving through space.
In the idea, each user gets a connection, this connection goes back to the CO. This can be done for about 3% over the cost of a shared connection, so don't spout BS about shared vs dedicated, because dedicated is nearly the same price.
At the CO, the customer plugs into a chassis. This chassis has enough back-plane and uplink to support all users running at 100% at the same time. Then the chassis plugs into the trunk. Up to this point, there is no over-subscription, but that's because it's still relatively cheap. The trunk is where over-subscription becomes important for waste reasons.
At the trunk level, typical user usage is 20:1. This means 19 out of 20 users, on average, are not using their connection at any given time. This is a fairly reliable metric. This means you can run dedicated bandwidth from the trunk to the customers, but the trunk only needs 1/20th of the bandwidth.
An example of making this work would be something like this. Say you have 100 customers with 100mb connections. Each user gets a dedicated connection to the trunk. Each customer pays $100/month. Then the ISP only purchases enough bandwidth for 1/20th of those users to use their full connections at the same time, so 500mb.
The ISP only pays $1/mbit for 500mb, so they pay $500 for 500mb. But they can sell that to 20x the users at $100/month for $10,000 of revenue. Each customer effectively gets 100mb every time they use their connection, but the ISP only needs to provide 1/20th of that on average.
Some of that $100 goes back to supporting the connection to the customer, so carve out $30 of that $100 bill for that. This leaves $70/customer as revenue towards transit bandwidth. That is $7,000 in revenue for $500 in costs, so about $6,500 in gross profit.
You ask, "but what if someone uses their connection 24/7, like bit torrent?"
That's the pain of averages. In a small population, you may not get the average, but in a large population, you should. If you're a large ISP with 50,000 customers in the city, that 20:1 ratio should be quite reliable.
letting 80% of your capacity go to waste, which would also be stupid.
Actually, this is the recommended. Your 95th percentile should not go over 50% of your link speed. 95th percentile only represents 1.5hours of the day. The ratio of 100th percentile to 95th percentile is about 50%. During those 1.5hours of the day, you link should not go past 75% usage. Look at any hosting company, they make sure they don't go past 80% peak usage. Past 80% average load micro-bursting starts to cause packet-loss or latency if you have buffer bloat.
Speaking about latency and packet-loss. Most ISPs use artificially large buffers to hide congestion. Instead of getting packet-loss, the Internet just becomes less responsive. without packet-loss, TCP streams typically find know if a route is congested. ISPs are artificially reducing quality in an attempt in keep throughput high. This helps make speed tests look nice, while web-browsing and other interactive uses suffer.
The Internet backbone is no where near capacity. They have 100x more dark fiber to light up if they needed and tech in the recent 3 years has allowed 1000x more bandwidth over the same fiber. Yes, the backbone is over subscribed, but not because it has to be, but because of the way users use the Internet. No matter what kind of data services you throw at the Internet, you get a fairly common 20:1 usage during peak hours. All that has changed is the average data usage, but not the ratio of actual users. The last mile can easily be 1:1 and the trunks 20:1, and you will not have bandwidth issues.
There is no shortage of bandwidth. Let the end users pay for what they want. It's cheap and affordable.
"Since payment and rates are based on balance of traffic, the ISPs end up paying a lot."
Unless you're attempt to peer for free, cost is not based on direction, just based on your max bandwidth used. If you download 10gb/s but only upload 1gb/s, you pay for 10gb. If upload 10gb/s but only download 1gb, then you still pay for 10gb. If you upload and download 10gb/s, you still pay for 10gb.
You would think ISPs would rather allow users to upload to help balance their bandwidth usages for a better argument to becoming a peer, but free peering is nearly impossible to break into for Tier 1 ISPs. They are much larger than you, so good luck. The only real benefit ISPs get from limiting uploads for their users is they can charge a premium for business users. How do you get business users to cough up 2x more money for more upload? Artificially restrict uploads to people who not only can afford it, but need it.
Except in this case, the delivery system is light. Are you paying for faster light? I am paying for x bandwidth, I should always get it when needed based on statistical averages. If I do not, then I am not getting what I paid for, also known as fraud.
You may want to understand how packet switching networks work. There isn't an actual stream of data, like water. Data is sent one packet at a time. In order for a packet to have priority over another packet, there needs to be more than one packet at the same time. But network devices send data one packet at a time. How do you get more than one packet at the same time? In order for this to happen, data must first start backing up, which means there is more data trying to move through the system than there is available bandwidth. Simple solution, add more bandwidth or stop selling something you don't have.
It's actually the cheapest solution also. QoS is expensive compared to bandwidth. Have you seen the prices on these DPI devices? They are not cheap, and they are very slow.
This is called artificial scarcity. Some ISPs would rather pay $100 for 1mb of DPI'd QoS bandwidth, than $100 for 100mb of raw bandwidth. Why? Control. And because they have no competition, they can pass the inflated price of slower bandwidth to the customer.
You have the choice to call their customer service every day to complain about your Internet being slow. Just put your cell phone on speaker while waiting on hold.
Comcast advertises 105mb for $115/month, but requires bundling, so one you include taxes, fees, rentals, etc, it's closer to $150. You can purchase world wide transit for $0.45/mbit at retail, so lets assume Comcast gets it closer to $0.4/mbit because they own a lot of their network and get closer to whole-sale prices. That 105mb connection will cost them about $42/month if it was running 24/7 and transferred 33.5TB. They are complaining when someone attempts to use 5mbit/s. That's a pretty good margin of profit if I have ever seen one. Buy bandwidth for $0.4 and resell it for $1.4, but then warn users not to use more than 1/100th of their connection on average.
The amount of over-subscription of bandwidth is insane. Once you include the data cap, customers are paying closer to $104/mbit, while Comcast is buying it up at $0.4. 99.7% profit margin. Might as well just hand them money, that's almost pure profit.
No line sharing over here and copper networks are expensive. Even Verizon came out recently and said converting 4% of their customer base to FIOS is now saving them $100m/year in support and increased their revenue. Sounds like some good return.
Even worse, deregulation is essentially deregulating access to public and private property. I'm not sure I want random start-ups digging up my year each time competition wants to come in. I want competition, but I don't want a torn up yard and flags everywhere all the time.
When wanting the absolute fastest tech can provide, adding overhead like QoS will always slow things down. 400gb is on the high end of what we can do right now for normal silicon. I hear all the time that QoS can not be done at the rates that large trunks operate at and the issue just keeps getting worse with time. As for "which", I was reading a column that claimed to have tested all brands that had 400gb ports, and while they had a slight variation in actual throughput on the 400gb port, all did horribly when QoS was enabled.
I'm not saying that we will never get 400gb to QoS at line rate, I'm saying that by the time we get 400gb to QoS at line rate, we will have 1tb ports that effectively run at 500gb when QoS is enabled. QoS will probably always be behind the speed curve.
QoS is pointless on a trunk because of limitations of trying to implement QoS at full line rate of bleeding edge speeds. You can get a 400gb port that does about 380gb/s in actual use. You might think, "I could resell that same 400gb port to 4 100gb users, then QoS one user higher than another and charge a premium". Not quite. Enabling QoS effectively drops the port speed down to 110gb/s.
If you had to choose between 400gb/s of bandwidth with no QoS or 100gb of bandwidth with QoS, which would you rather have? It gets worse at higher speeds.
Where else could you do QoS as an ISP? The last mile? I would have to ask, why do you have congestion in the last mile? Modern fiber gets rid of this issue. QoS is more of a bandaid for old bandwidth limited copper last-mile designs.
Where QoS could be useful is the end user, but you should probably leave that up to their router. QoS is not something an ISP should be doing at least not an ISP that has access to the capital required to lay out a modern fiber network and large bandwidth commits to get bandwidth for under $1/mbit.
Their cache site has requirements like 5gb/s of peak bandwidth. Does your ISP have about 1,000 users streaming Netflix at the same time for about 2 hours very day? If not, then Netflix doesn't care because the bandwidth costs them less than the caching device. My ISP refuses to get a Netflix caching device because bandwidth is too damn cheap. They would rather have everyone streaming 12mb/s than paying electrical costs on a cache server.
Of course Netflix is not going to let your ISP do their own caching, Netflix does not have the authority to give them copyrighted content.
Your ISP could always peer with Netflix at almost any IX.
How does BlueCoat know when an SSL cert is correct? I can get free SSL certs signed by CAs with no verification process. The cert fingerprint is a very important part of the security process. There are live security exploits that take advantage of these man-in-the-middle issues. Some systems entirely rely on cert validation, but if you install your own custom cert and over-ride the security of this, then you'd better make sure the proxy is re-implementing this application level security validation, otherwise someone could be running applications on your computer as "system".
These security exploits are not bugs, but features working as intended. If the application is going to be doing cert validation and BlueCoat is intercepting and signing stuff, then BlueCoat is responsible for this validation.
I was doing some reading on Mantle, and there's some interesting things I noted. One of the things about Mantle is you can create "task" queues. You register a queue with some consumer, be that the CPU or a GPU. Registering the queue is a system call, but the queue itself is in user land. Each task is a data structure that contains a few things, several of them were stuff that I was less interested in, but a few stood out. One was a pointer to a function and another was a pointer to some data.
The way this sounds to work is your CPU can do some work, then package it up in a nice area in memory and enqueue a function pointer and data pointer into this queue. The GPU will then be notified without any system calls, then it will at its leisure, look at the function pointer and start executing the code against the data pointer, which is probably your matrix of data to crunch.
Here comes another cool part. Once the GPU is done crunching this data, it can do the same thing back at the CPU because the GPU can have queues registered against the CPU. This means the CPU and GPU and ping-pong work back and forth with little effort.
How does the GPU/CPU get notified? Well, it just so happens that these tasks are 64bytes, the size of a cache line. This means the cache-coherency protocol could easily notify the device when a queue has work ready, effectively having a hardware accelerated event system. I'm sure there are other more traditional ways to do this for non IGPs.
Since both the GPU and CPU use the same protected memory, there is no data copying the programmer needs to be aware of, it's all transparent. All pointers naturally work. Not only that, but the GPU can cause page faults, so data sets no longer must fit into GPU memory, but can actually be stored in system memory, or event better, swapped out. I'm not saying swapping is good, just that it's much easier to handle than a programmer manually doing memory management.
Even more good news. These GPUs are full C/C++ capable. No funny custom languages to use, good old C. Nothing says "I like to work with buffers of data and pointers" than C.
Do the biggest reason Mantle will help is because it can completely by-pass system calls and allow producer-consumer queues and use event notification for when work is ready. Mantle is supposed to be GPU independent, so Nvidia should also be able to implement it, but without tight GPU integration, I'm not sure it will be as efficient, but still better than system calls.
What can happen now is a network of task queues connecting the CPU, IGP, and any other GPUs. If you have more than one GPU, each GPU can have it's own queue. You can actually register as many queues as you want, which means an 8 core CPU will probably one queue for each core for each device. This could be a first great attempt at unifying GPUs and CPUs into one massive processing system.
Dice has some interesting stuff about how they can get BF4 efficiently using 90%-95% of an 8 core CPU while offloading lots of work to the GPU and IGP. Better use of multi-core CPUs, lower latency, higher throughput, what's not to like? The design looks good, the idea sounds awesome, now we wait for the implementation. No matter what happens, I see this eventually being the future, be it Mantle or some other API.
APUs are only bandwidth starved when working with large datasets. There is a huge class of work-loads that are small amounts of data but require a lot of processing. In these cases, memory bandwidth isn't the limiting factor in any way. In many of these cases, it's faster to process the data on a 80GFlops CPU than to offload to a 3TFlops discreet GPU. Now we have a 900TFlop GPU that is only a few nanoseconds away from the CPU instead of tens of microseconds.
These are fully programmable GPUs that support preemptive multitasking, protected mode memory-addressing, can even cause page faults to use virtual memory transparently with the OS. Now for the good part. Fully C and C++ compliant. If you can write OpenCL in C or C++, then you can write it on these GPUs.
Actually, their intent is for the IGP to be used like a co-processor. The IGP has about 2 magnitudes lower latency than discreet GPUs, that makes a difference.
If we get our 20 megabits/second delayed by 300ms because someone QoS'ed a phonecall higher than us; we wont mind.
Only two things cause latency, distance or congestion. Distance is limited by the speed of light and has nothing to do with your ISP. The other thing that affect latency is congestion. If you have high latency, you also have low bandwidth, because it means your route is "full". You can't have high latency high bandwidth unless the high latency is caused by physical limitations.
Enjoy your 1mb delayed by 300ms.
Let me know how your "parallel data planes" work on that serial connection. Some of these high end devices are pushing data faster than L1 cache on a 4ghz Intel CPU. There are limits. At the nanometer level, those extra paths cause delays, and those delays slow things down.
If the ISP wants to provide a service that prioritizes certain content over other content, that's their prerogative because it's their network and their service to provide as they see fit.
If the customer is paying for 100mb Internet and the ISP is actively limiting them to less than 5mb, that is fraud.
At some point in the past, I brought up a question in a physics forum used by college, graduate, and PHDs. I asked if a blackhole had a radius greater than 1 light second, and something fell past the event horizon, wouldn't the object be moving fast as light after 1 seconds given that acceleration at the even horizon is c/s^2? Several people responded saying it doesn't matter because it's not proper acceleration, so the same restrictions do not apply. Either I got massively trolled or it seems to be common acceptance that faster than light speeds can happen to space itself, just not to objects moving through space.
I thought the speed of light is the fastest information can travel through space, but not to say there aren't shortcuts to using space.
No No No, you're looking at it all wrong.
In the idea, each user gets a connection, this connection goes back to the CO. This can be done for about 3% over the cost of a shared connection, so don't spout BS about shared vs dedicated, because dedicated is nearly the same price.
At the CO, the customer plugs into a chassis. This chassis has enough back-plane and uplink to support all users running at 100% at the same time. Then the chassis plugs into the trunk. Up to this point, there is no over-subscription, but that's because it's still relatively cheap. The trunk is where over-subscription becomes important for waste reasons.
At the trunk level, typical user usage is 20:1. This means 19 out of 20 users, on average, are not using their connection at any given time. This is a fairly reliable metric. This means you can run dedicated bandwidth from the trunk to the customers, but the trunk only needs 1/20th of the bandwidth.
An example of making this work would be something like this. Say you have 100 customers with 100mb connections. Each user gets a dedicated connection to the trunk. Each customer pays $100/month. Then the ISP only purchases enough bandwidth for 1/20th of those users to use their full connections at the same time, so 500mb.
The ISP only pays $1/mbit for 500mb, so they pay $500 for 500mb. But they can sell that to 20x the users at $100/month for $10,000 of revenue. Each customer effectively gets 100mb every time they use their connection, but the ISP only needs to provide 1/20th of that on average.
Some of that $100 goes back to supporting the connection to the customer, so carve out $30 of that $100 bill for that. This leaves $70/customer as revenue towards transit bandwidth. That is $7,000 in revenue for $500 in costs, so about $6,500 in gross profit.
You ask, "but what if someone uses their connection 24/7, like bit torrent?"
That's the pain of averages. In a small population, you may not get the average, but in a large population, you should. If you're a large ISP with 50,000 customers in the city, that 20:1 ratio should be quite reliable.
letting 80% of your capacity go to waste, which would also be stupid.
Actually, this is the recommended. Your 95th percentile should not go over 50% of your link speed. 95th percentile only represents 1.5hours of the day. The ratio of 100th percentile to 95th percentile is about 50%. During those 1.5hours of the day, you link should not go past 75% usage. Look at any hosting company, they make sure they don't go past 80% peak usage. Past 80% average load micro-bursting starts to cause packet-loss or latency if you have buffer bloat.
Speaking about latency and packet-loss. Most ISPs use artificially large buffers to hide congestion. Instead of getting packet-loss, the Internet just becomes less responsive. without packet-loss, TCP streams typically find know if a route is congested. ISPs are artificially reducing quality in an attempt in keep throughput high. This helps make speed tests look nice, while web-browsing and other interactive uses suffer.
The Internet backbone is no where near capacity. They have 100x more dark fiber to light up if they needed and tech in the recent 3 years has allowed 1000x more bandwidth over the same fiber. Yes, the backbone is over subscribed, but not because it has to be, but because of the way users use the Internet. No matter what kind of data services you throw at the Internet, you get a fairly common 20:1 usage during peak hours. All that has changed is the average data usage, but not the ratio of actual users. The last mile can easily be 1:1 and the trunks 20:1, and you will not have bandwidth issues.
There is no shortage of bandwidth. Let the end users pay for what they want. It's cheap and affordable.
"Since payment and rates are based on balance of traffic, the ISPs end up paying a lot."
Unless you're attempt to peer for free, cost is not based on direction, just based on your max bandwidth used. If you download 10gb/s but only upload 1gb/s, you pay for 10gb. If upload 10gb/s but only download 1gb, then you still pay for 10gb. If you upload and download 10gb/s, you still pay for 10gb.
You would think ISPs would rather allow users to upload to help balance their bandwidth usages for a better argument to becoming a peer, but free peering is nearly impossible to break into for Tier 1 ISPs. They are much larger than you, so good luck. The only real benefit ISPs get from limiting uploads for their users is they can charge a premium for business users. How do you get business users to cough up 2x more money for more upload? Artificially restrict uploads to people who not only can afford it, but need it.
Except in this case, the delivery system is light. Are you paying for faster light? I am paying for x bandwidth, I should always get it when needed based on statistical averages. If I do not, then I am not getting what I paid for, also known as fraud.
You may want to understand how packet switching networks work. There isn't an actual stream of data, like water. Data is sent one packet at a time. In order for a packet to have priority over another packet, there needs to be more than one packet at the same time. But network devices send data one packet at a time. How do you get more than one packet at the same time? In order for this to happen, data must first start backing up, which means there is more data trying to move through the system than there is available bandwidth. Simple solution, add more bandwidth or stop selling something you don't have.
It's actually the cheapest solution also. QoS is expensive compared to bandwidth. Have you seen the prices on these DPI devices? They are not cheap, and they are very slow.
This is called artificial scarcity. Some ISPs would rather pay $100 for 1mb of DPI'd QoS bandwidth, than $100 for 100mb of raw bandwidth. Why? Control. And because they have no competition, they can pass the inflated price of slower bandwidth to the customer.
You have the choice to call their customer service every day to complain about your Internet being slow. Just put your cell phone on speaker while waiting on hold.
Comcast advertises 105mb for $115/month, but requires bundling, so one you include taxes, fees, rentals, etc, it's closer to $150. You can purchase world wide transit for $0.45/mbit at retail, so lets assume Comcast gets it closer to $0.4/mbit because they own a lot of their network and get closer to whole-sale prices. That 105mb connection will cost them about $42/month if it was running 24/7 and transferred 33.5TB. They are complaining when someone attempts to use 5mbit/s. That's a pretty good margin of profit if I have ever seen one. Buy bandwidth for $0.4 and resell it for $1.4, but then warn users not to use more than 1/100th of their connection on average.
The amount of over-subscription of bandwidth is insane. Once you include the data cap, customers are paying closer to $104/mbit, while Comcast is buying it up at $0.4. 99.7% profit margin. Might as well just hand them money, that's almost pure profit.
No line sharing over here and copper networks are expensive. Even Verizon came out recently and said converting 4% of their customer base to FIOS is now saving them $100m/year in support and increased their revenue. Sounds like some good return.
Even worse, deregulation is essentially deregulating access to public and private property. I'm not sure I want random start-ups digging up my year each time competition wants to come in. I want competition, but I don't want a torn up yard and flags everywhere all the time.
When wanting the absolute fastest tech can provide, adding overhead like QoS will always slow things down. 400gb is on the high end of what we can do right now for normal silicon. I hear all the time that QoS can not be done at the rates that large trunks operate at and the issue just keeps getting worse with time. As for "which", I was reading a column that claimed to have tested all brands that had 400gb ports, and while they had a slight variation in actual throughput on the 400gb port, all did horribly when QoS was enabled.
I'm not saying that we will never get 400gb to QoS at line rate, I'm saying that by the time we get 400gb to QoS at line rate, we will have 1tb ports that effectively run at 500gb when QoS is enabled. QoS will probably always be behind the speed curve.
QoS is pointless on a trunk because of limitations of trying to implement QoS at full line rate of bleeding edge speeds. You can get a 400gb port that does about 380gb/s in actual use. You might think, "I could resell that same 400gb port to 4 100gb users, then QoS one user higher than another and charge a premium". Not quite. Enabling QoS effectively drops the port speed down to 110gb/s.
If you had to choose between 400gb/s of bandwidth with no QoS or 100gb of bandwidth with QoS, which would you rather have? It gets worse at higher speeds.
Where else could you do QoS as an ISP? The last mile? I would have to ask, why do you have congestion in the last mile? Modern fiber gets rid of this issue. QoS is more of a bandaid for old bandwidth limited copper last-mile designs.
Where QoS could be useful is the end user, but you should probably leave that up to their router. QoS is not something an ISP should be doing at least not an ISP that has access to the capital required to lay out a modern fiber network and large bandwidth commits to get bandwidth for under $1/mbit.
Except the backbone doesn't have congestion issues and won't QoS your data.
Their cache site has requirements like 5gb/s of peak bandwidth. Does your ISP have about 1,000 users streaming Netflix at the same time for about 2 hours very day? If not, then Netflix doesn't care because the bandwidth costs them less than the caching device. My ISP refuses to get a Netflix caching device because bandwidth is too damn cheap. They would rather have everyone streaming 12mb/s than paying electrical costs on a cache server.
Of course Netflix is not going to let your ISP do their own caching, Netflix does not have the authority to give them copyrighted content.
Your ISP could always peer with Netflix at almost any IX.
How does BlueCoat know when an SSL cert is correct? I can get free SSL certs signed by CAs with no verification process. The cert fingerprint is a very important part of the security process. There are live security exploits that take advantage of these man-in-the-middle issues. Some systems entirely rely on cert validation, but if you install your own custom cert and over-ride the security of this, then you'd better make sure the proxy is re-implementing this application level security validation, otherwise someone could be running applications on your computer as "system".
These security exploits are not bugs, but features working as intended. If the application is going to be doing cert validation and BlueCoat is intercepting and signing stuff, then BlueCoat is responsible for this validation.
I was doing some reading on Mantle, and there's some interesting things I noted. One of the things about Mantle is you can create "task" queues. You register a queue with some consumer, be that the CPU or a GPU. Registering the queue is a system call, but the queue itself is in user land. Each task is a data structure that contains a few things, several of them were stuff that I was less interested in, but a few stood out. One was a pointer to a function and another was a pointer to some data.
The way this sounds to work is your CPU can do some work, then package it up in a nice area in memory and enqueue a function pointer and data pointer into this queue. The GPU will then be notified without any system calls, then it will at its leisure, look at the function pointer and start executing the code against the data pointer, which is probably your matrix of data to crunch.
Here comes another cool part. Once the GPU is done crunching this data, it can do the same thing back at the CPU because the GPU can have queues registered against the CPU. This means the CPU and GPU and ping-pong work back and forth with little effort.
How does the GPU/CPU get notified? Well, it just so happens that these tasks are 64bytes, the size of a cache line. This means the cache-coherency protocol could easily notify the device when a queue has work ready, effectively having a hardware accelerated event system. I'm sure there are other more traditional ways to do this for non IGPs.
Since both the GPU and CPU use the same protected memory, there is no data copying the programmer needs to be aware of, it's all transparent. All pointers naturally work. Not only that, but the GPU can cause page faults, so data sets no longer must fit into GPU memory, but can actually be stored in system memory, or event better, swapped out. I'm not saying swapping is good, just that it's much easier to handle than a programmer manually doing memory management.
Even more good news. These GPUs are full C/C++ capable. No funny custom languages to use, good old C. Nothing says "I like to work with buffers of data and pointers" than C.
Do the biggest reason Mantle will help is because it can completely by-pass system calls and allow producer-consumer queues and use event notification for when work is ready. Mantle is supposed to be GPU independent, so Nvidia should also be able to implement it, but without tight GPU integration, I'm not sure it will be as efficient, but still better than system calls.
What can happen now is a network of task queues connecting the CPU, IGP, and any other GPUs. If you have more than one GPU, each GPU can have it's own queue. You can actually register as many queues as you want, which means an 8 core CPU will probably one queue for each core for each device. This could be a first great attempt at unifying GPUs and CPUs into one massive processing system.
Dice has some interesting stuff about how they can get BF4 efficiently using 90%-95% of an 8 core CPU while offloading lots of work to the GPU and IGP. Better use of multi-core CPUs, lower latency, higher throughput, what's not to like? The design looks good, the idea sounds awesome, now we wait for the implementation. No matter what happens, I see this eventually being the future, be it Mantle or some other API.
APUs are only bandwidth starved when working with large datasets. There is a huge class of work-loads that are small amounts of data but require a lot of processing. In these cases, memory bandwidth isn't the limiting factor in any way. In many of these cases, it's faster to process the data on a 80GFlops CPU than to offload to a 3TFlops discreet GPU. Now we have a 900TFlop GPU that is only a few nanoseconds away from the CPU instead of tens of microseconds.
These are fully programmable GPUs that support preemptive multitasking, protected mode memory-addressing, can even cause page faults to use virtual memory transparently with the OS. Now for the good part. Fully C and C++ compliant. If you can write OpenCL in C or C++, then you can write it on these GPUs.
There are some upcoming new techs that make use of IGPs, in that the IGP is potentially 10x faster than a discreet GPU because of latency issues.
Actually, their intent is for the IGP to be used like a co-processor. The IGP has about 2 magnitudes lower latency than discreet GPUs, that makes a difference.