It has one 256bit FPU that can split to handle any mix of FPU between both cores. It can do 2x 128bit or 128bit and 2x 64bit or 128bit and 4x 32bit or 2x 64bit and 4x 32bit.. etc etc... Right now only AVX instructions can use the full 256bit, so any SSE or less won't "share" the FPU. Each core is actually able to pipeline up to 4x 32bit, even if not using SIMD. No matter how you look at it, it can handle 256bit worth of FPU calculations. The FPU eats up a lot of transitors and is one of the least used parts. Even in heavy FPU core, there are enough memory stalls or branching logic to make it not used often. There is little reason to have two full 256bit FPU units.
Code the best works with heavy FPU usage typically works well with GPUs. Thus, the APU. Need a mix of high throughput and low latency? The APU has you covered. With about 10ns latency to the IGP compared to 10-100 microseconds for discreet GPUs, many work loads can notice a throughput increase.
The whole point of AMD APUs is low cost gaming. That is, lower cost than buying a dedicated GPU plus a processor. Many already argue that you don't save much by buying an APU. A cheap Pentium G3220 with a AMD Radeon 7730 costs the same as the A10 Kaveri APU, and will give better frame rate. Even if the Kaveri APU prices come down, the savings will be small. If you have to buy the GDDR5 memory, there won't be any savings. It's understandable that AMD didn't take that route.
AMD is aiming for "good enough", and they did a great job. Per thread, AMD is now on par with Intel's Haswell and has an integrated GPU that can cover the 80/20 rule for games. The only issue I personally have is that AMD's current Kaveri offerings are limited to a 2 module(4 core) setup that consumes about 50% more power than 2 core(4 Hyper-thread) Haswell while idle and about 100% more power under pure CPU load. Since I will have a discrete GPU, I see no benefit to consuming that much more power. We're talking about a 20watt difference idle and about a 50watt difference under load. And they're about the same price.
Increasing the core count reduces per core performance and increases total power consumption and production cost. With most cores idle, it is a bad thing to have too many cores.
DDR3 and GDDR5 have nearly the same latency when measured in nanoseconds. When measured in clock cycles GDDR5 is higher latency, but it has more cycles. This has to do with the external interface, which is more serial in nature than the internal. The data path for the internal is quite wide, but the external datapath is not because traces on a motherboard are expensive. To crank up the bandwidth, you increase the frequency. If the internal frequency remains fixed and the external frequency goes up, the external "latency" seeming goes up.
Common syntax that results in different behaviour? That sounds more confusing to me.
Exactly. It seems like the only people who want firewall syntax in the form of a popular scripting language are people who should not be touching firewall rules in the first place.
I assume most ISPs don't get 100mb links. You can purchase dark fiber for about $125/mile, get to your local IX, which will probably run you a $25k bill on the dark fiber, then you purchase bandwidth from HE for $0.45/mbit after paying $5k/month for 100gb/s peering link at the IX. You still have to pay in the 5 digits, but almost all of the cost is the infrastructure, the bandwidth is cheap.
100mb is about $3k/month from Level 3 and 1gb is about $6k/month. 10x the speed for 2x the price seems to be common scaling factor. At some point, the bandwidth is really really cheap.
Around here, Schools can get access to a co-op that sells 1gb/1gb for $300/month and that comes with a business class SLA and guaranteed speeds. It's a self-sustaining business with no government or private support. That is the cost of 1gb/s, $300.
Trunk bandwidth is the cheapest part of being an ISP as long as you're not out in the middle of no-where. At $0.45/mbit for dedicated backbone connection. Bandwidth is charged by 95th percentile. That means the customer must average about 2 hours of transfer per day all month long. In order to consume 100GB, that means an average of 7mb/s for 2 hours every day, or about 2 Netflix streams. That would cost the ISP about $3.5, but they turn around and re-sell it for $300. Sounds like easy money. You just need to get your foot in the door. Once you're "that" ISP, you just print cash.
I call BS. Prices are dropping everywhere. Backbone bandwidth, -50% per year. It costs only $1,800 through $3,000 to do FTTH. At $300/month, you could be the proud owner of a 1gb/1gb dedicated fiber connection in 10 months. If I have to choose between someone being a total idiot or being greedy, I'm doing with greedy.
That comparison is ridiculous. The linked article equates an hourly wage with a diluted version of slavery: "similarities between owning and renting a person". leaving out the fact that the "rented" person is not prevented by the employer from quitting.
The funny thing is the income difference between the median US citizen and a 0.1%'er is greater now than the same difference between slaves and slave owners back in the days of Rome. Money wise, slaves had a better life than us "free" people.
"Good intentions" may be enough for Linux, but OpenBSD likes to have reasoning behind the ideas. Actually, OpenBSD's target isn't even that of being used, which is why it doesn't support proper multi-threading. Their entire focus is making is secure and doing it correctly the first time. It's a platform that aims more for theoretically correct designs, but it just so happens to be quite decent in many practical applications, like firewalls.
I'm surprised that this wasn't implemented a long time ago. Even Windows has had signed code for quiet some time.
Having code signed by a central CA seems to be again what OpenBSD and FreeBSD are trying to do. They don't want to play god and gate keeper. They held off as long as they could to see if a new distributed public key system could have came out. Unfortunately, a new public key system has not come out and the security benefit is too grate, even if against their ideology.
As much as I am against capital punishment, I'm not a fan of that saying. Better a world full of blind people than a world full of jerks who have blinded everyone else, but they themselves can see, enabling them to prey that much better on their blind victims.
It's been a long while since researching the subject, but it was something along the lines of 30% of people who get killed on death row, get proven innocent some time after. Partly because of aggressive DAs that only care about winning at all costs.
So long as our justice system uses humans, I won't trust it to kill people.
Ditto. All POTS in my area are getting phased out and replaced with VOIP over Ethernet, which is hooked to your house via fiber. It would be kind of funny dialing up AOL over my 1gb/1gb fiber line to access the Internet.
Around most of the USA the only communications companies with access to ROWs are telcom or cable. There is no access for a company that is only an ISP. The overhead to become and accredited telcom or cable company is huge.
There is only two types of property, private and public. The government manages public property, which places a logical constraint that it must involve government intervention. Then there's private property, which is mostly groups of islands interconnected via public property. It is illogical to have no government intervention or regulation for something like infrastructure because infrastructure spans public and private property.
The point of an APU is not directly that of, lets integrate the GPU to save money! It's because transistors are relatively cheap and it is quite cheap to make a CPU that has more transistors than the TDP will allow to use at once.
When you can't use all of your transistors at the same time, you need to make sure the transistors you are using are the most efficient at getting the current work done. Enter Heterogeneous computing. The GPU is much more efficient than the CPU at certain types of work, talking about factors and potentially magnitudes differences. Discrete GPUs would be great for this! WRONG! Latency of communications between the host CPU and the discreet GPU are measured in microseconds. Some work loads will not do well at all processing small amounts of information and wasting lots of time waiting.
What is one to do? Integrate that b#tch! So now we got the first gen of IGPs, but they're hard to program for and require copying data everywhere. How could that be made easier and reduce data duplication? I know! Lets make the GPU naively support all of the features of the C/C++ languages and make the GPU work with virtual memory and make it so the CPU can by-pass the OS and talk directly to the GPU without system calls.
Enter Kaveri. Mantle allows the CPU to register command buffers with a one time fixed cost as a system call to init, but no system calls required after that. With this, the CPU can pass function pointers along with a data pointer to a user-mode buffer that the GPU will get notified when more work is available. Because the GPU supports virtual memory and protected mode, the GPU can naturally work directly with all pointers passed from the CPU and visa versa. The GPU also supports preemptive multi-tasking which allows the OS to properly schedule GPU compute workloads
What we now have is GPU that is about as easy to program as the CPU with an insanely fast way to communicate between the GPU and CPU. We went from a 30,000 clockcycle latency plus a 30,000 clockcycle system call with discrete cards to a 30 clock cycle latency with no system calls. In compute heavy workloads, you can now run the work where it is most efficient, increasing throughput and reducing power. For a given TDP, you can now do more work.
Except "Intel InstantAccess" requires making system calls to allow the kernel to map GPU memory to user space. AMD's HSA requires nothing special at all. The GPU understands and honors protected mode, so you can arbitrarily pass pointers to and from the GPU with no system calls. You can even communicate between the GPU and CPU without system calls. AMD HSA even lets the GPU work with virtual memory. "Intel InstantAccess" only works with data that is in memory, AMD can issue page faults and let the OS load from the page file.
Ohh yes. Lets solder memory right on, increasing board complexity and gaining almost no advantage. The APU is meant to be a mixture of a "good enough" GPU, and a higher performance compute-unit for low memory problems, which there are a lot of. As for open source, AMD is actively committing work to the Linux kernel in both the mantle framework and better driver support. They are also working with Steam, because the SteamOS is Linux which means AMD needs decent Linux drivers if they plan to be used.
Yes, it is not a very good GPU when it comes to high end graphics because it has about 1/3rd the flops of a discreet GPU and it is memory bandwidth starved for those work loads, but for non graphics related work loads, it's perfect. It is the first of something new. How many people piss and moaned about FPUs when they came out? "derp, there's no software that uses them, so they must be useless". You need to have the platform before you can have the developers. Once the next gen consoles start taking off, expect games to be nearly directly ported and taking advantage of this new GPU paradigm.
It has one 256bit FPU that can split to handle any mix of FPU between both cores. It can do 2x 128bit or 128bit and 2x 64bit or 128bit and 4x 32bit or 2x 64bit and 4x 32bit.. etc etc... Right now only AVX instructions can use the full 256bit, so any SSE or less won't "share" the FPU. Each core is actually able to pipeline up to 4x 32bit, even if not using SIMD. No matter how you look at it, it can handle 256bit worth of FPU calculations. The FPU eats up a lot of transitors and is one of the least used parts. Even in heavy FPU core, there are enough memory stalls or branching logic to make it not used often. There is little reason to have two full 256bit FPU units.
Code the best works with heavy FPU usage typically works well with GPUs. Thus, the APU. Need a mix of high throughput and low latency? The APU has you covered. With about 10ns latency to the IGP compared to 10-100 microseconds for discreet GPUs, many work loads can notice a throughput increase.
The whole point of AMD APUs is low cost gaming. That is, lower cost than buying a dedicated GPU plus a processor. Many already argue that you don't save much by buying an APU. A cheap Pentium G3220 with a AMD Radeon 7730 costs the same as the A10 Kaveri APU, and will give better frame rate. Even if the Kaveri APU prices come down, the savings will be small. If you have to buy the GDDR5 memory, there won't be any savings. It's understandable that AMD didn't take that route.
AMD is aiming for "good enough", and they did a great job. Per thread, AMD is now on par with Intel's Haswell and has an integrated GPU that can cover the 80/20 rule for games. The only issue I personally have is that AMD's current Kaveri offerings are limited to a 2 module(4 core) setup that consumes about 50% more power than 2 core(4 Hyper-thread) Haswell while idle and about 100% more power under pure CPU load. Since I will have a discrete GPU, I see no benefit to consuming that much more power. We're talking about a 20watt difference idle and about a 50watt difference under load. And they're about the same price.
Increasing the core count reduces per core performance and increases total power consumption and production cost. With most cores idle, it is a bad thing to have too many cores.
DDR3 and GDDR5 have nearly the same latency when measured in nanoseconds. When measured in clock cycles GDDR5 is higher latency, but it has more cycles. This has to do with the external interface, which is more serial in nature than the internal. The data path for the internal is quite wide, but the external datapath is not because traces on a motherboard are expensive. To crank up the bandwidth, you increase the frequency. If the internal frequency remains fixed and the external frequency goes up, the external "latency" seeming goes up.
Common syntax that results in different behaviour? That sounds more confusing to me.
Exactly. It seems like the only people who want firewall syntax in the form of a popular scripting language are people who should not be touching firewall rules in the first place.
I assume most ISPs don't get 100mb links. You can purchase dark fiber for about $125/mile, get to your local IX, which will probably run you a $25k bill on the dark fiber, then you purchase bandwidth from HE for $0.45/mbit after paying $5k/month for 100gb/s peering link at the IX. You still have to pay in the 5 digits, but almost all of the cost is the infrastructure, the bandwidth is cheap.
100mb is about $3k/month from Level 3 and 1gb is about $6k/month. 10x the speed for 2x the price seems to be common scaling factor. At some point, the bandwidth is really really cheap.
Around here, Schools can get access to a co-op that sells 1gb/1gb for $300/month and that comes with a business class SLA and guaranteed speeds. It's a self-sustaining business with no government or private support. That is the cost of 1gb/s, $300.
Trunk bandwidth is the cheapest part of being an ISP as long as you're not out in the middle of no-where. At $0.45/mbit for dedicated backbone connection. Bandwidth is charged by 95th percentile. That means the customer must average about 2 hours of transfer per day all month long. In order to consume 100GB, that means an average of 7mb/s for 2 hours every day, or about 2 Netflix streams. That would cost the ISP about $3.5, but they turn around and re-sell it for $300. Sounds like easy money. You just need to get your foot in the door. Once you're "that" ISP, you just print cash.
costs have increased by 900% since 2009
I call BS. Prices are dropping everywhere. Backbone bandwidth, -50% per year. It costs only $1,800 through $3,000 to do FTTH. At $300/month, you could be the proud owner of a 1gb/1gb dedicated fiber connection in 10 months. If I have to choose between someone being a total idiot or being greedy, I'm doing with greedy.
If Alaska filled their pipe with fiber instead of oil, it'd be much faster.
That comparison is ridiculous. The linked article equates an hourly wage with a diluted version of slavery: "similarities between owning and renting a person". leaving out the fact that the "rented" person is not prevented by the employer from quitting.
The funny thing is the income difference between the median US citizen and a 0.1%'er is greater now than the same difference between slaves and slave owners back in the days of Rome. Money wise, slaves had a better life than us "free" people.
"Good intentions" may be enough for Linux, but OpenBSD likes to have reasoning behind the ideas. Actually, OpenBSD's target isn't even that of being used, which is why it doesn't support proper multi-threading. Their entire focus is making is secure and doing it correctly the first time. It's a platform that aims more for theoretically correct designs, but it just so happens to be quite decent in many practical applications, like firewalls.
I'm surprised that this wasn't implemented a long time ago. Even Windows has had signed code for quiet some time.
Having code signed by a central CA seems to be again what OpenBSD and FreeBSD are trying to do. They don't want to play god and gate keeper. They held off as long as they could to see if a new distributed public key system could have came out. Unfortunately, a new public key system has not come out and the security benefit is too grate, even if against their ideology.
C and C++ do offer huge performance benefits over Java, Ruby, and JavaScript.
Yeah, the 2x performance gain is totally worth longer development times. Do whatever works best for your situation.
An eye for an eye, and the whole world is blind.
As much as I am against capital punishment, I'm not a fan of that saying. Better a world full of blind people than a world full of jerks who have blinded everyone else, but they themselves can see, enabling them to prey that much better on their blind victims.
But who do you trust to wield such power as to choose who gets to live or die?
It's been a long while since researching the subject, but it was something along the lines of 30% of people who get killed on death row, get proven innocent some time after. Partly because of aggressive DAs that only care about winning at all costs.
So long as our justice system uses humans, I won't trust it to kill people.
Ditto. All POTS in my area are getting phased out and replaced with VOIP over Ethernet, which is hooked to your house via fiber. It would be kind of funny dialing up AOL over my 1gb/1gb fiber line to access the Internet.
Around most of the USA the only communications companies with access to ROWs are telcom or cable. There is no access for a company that is only an ISP. The overhead to become and accredited telcom or cable company is huge.
are putting data/power wires in all the time
There is only two types of property, private and public. The government manages public property, which places a logical constraint that it must involve government intervention. Then there's private property, which is mostly groups of islands interconnected via public property. It is illogical to have no government intervention or regulation for something like infrastructure because infrastructure spans public and private property.
The point of an APU is not directly that of, lets integrate the GPU to save money! It's because transistors are relatively cheap and it is quite cheap to make a CPU that has more transistors than the TDP will allow to use at once.
When you can't use all of your transistors at the same time, you need to make sure the transistors you are using are the most efficient at getting the current work done. Enter Heterogeneous computing. The GPU is much more efficient than the CPU at certain types of work, talking about factors and potentially magnitudes differences. Discrete GPUs would be great for this! WRONG! Latency of communications between the host CPU and the discreet GPU are measured in microseconds. Some work loads will not do well at all processing small amounts of information and wasting lots of time waiting.
What is one to do? Integrate that b#tch! So now we got the first gen of IGPs, but they're hard to program for and require copying data everywhere. How could that be made easier and reduce data duplication? I know! Lets make the GPU naively support all of the features of the C/C++ languages and make the GPU work with virtual memory and make it so the CPU can by-pass the OS and talk directly to the GPU without system calls.
Enter Kaveri. Mantle allows the CPU to register command buffers with a one time fixed cost as a system call to init, but no system calls required after that. With this, the CPU can pass function pointers along with a data pointer to a user-mode buffer that the GPU will get notified when more work is available. Because the GPU supports virtual memory and protected mode, the GPU can naturally work directly with all pointers passed from the CPU and visa versa. The GPU also supports preemptive multi-tasking which allows the OS to properly schedule GPU compute workloads
What we now have is GPU that is about as easy to program as the CPU with an insanely fast way to communicate between the GPU and CPU. We went from a 30,000 clockcycle latency plus a 30,000 clockcycle system call with discrete cards to a 30 clock cycle latency with no system calls. In compute heavy workloads, you can now run the work where it is most efficient, increasing throughput and reducing power. For a given TDP, you can now do more work.
Get back to me when Intel has something that requires no system calls and has a unified memory space, then they'll be comparable.
Kaveri supports protected memory and even preemptive multitasking.
Except "Intel InstantAccess" requires making system calls to allow the kernel to map GPU memory to user space. AMD's HSA requires nothing special at all. The GPU understands and honors protected mode, so you can arbitrarily pass pointers to and from the GPU with no system calls. You can even communicate between the GPU and CPU without system calls. AMD HSA even lets the GPU work with virtual memory. "Intel InstantAccess" only works with data that is in memory, AMD can issue page faults and let the OS load from the page file.
Ohh yes. Lets solder memory right on, increasing board complexity and gaining almost no advantage. The APU is meant to be a mixture of a "good enough" GPU, and a higher performance compute-unit for low memory problems, which there are a lot of. As for open source, AMD is actively committing work to the Linux kernel in both the mantle framework and better driver support. They are also working with Steam, because the SteamOS is Linux which means AMD needs decent Linux drivers if they plan to be used.
Yes, it is not a very good GPU when it comes to high end graphics because it has about 1/3rd the flops of a discreet GPU and it is memory bandwidth starved for those work loads, but for non graphics related work loads, it's perfect. It is the first of something new. How many people piss and moaned about FPUs when they came out? "derp, there's no software that uses them, so they must be useless". You need to have the platform before you can have the developers. Once the next gen consoles start taking off, expect games to be nearly directly ported and taking advantage of this new GPU paradigm.