I wonder if the lost reduction going from 8bit to 10bit is most data isn't actually sourced as 10bit and those extra bits are just noise, bloating the end size.
Umm, no. If you click on the link and look at the first graph, it clearly shows VP9 is about 17% better than 264 at 360p, 32% better at 720p, and 43% better at 1080p. VP9 would be a win over 264, but 265 is an additional 20% more.
Bandwidth is pretty cheap. It only represents a few pennies on your $10/m Netflix bill. H.265 would not only have to save money, but save enough money to make the red-tape of licensing worth it. And once you become dependent on it, they may raise the fees, uncertainty. The cost of raw bandwidth drops nearly 50% year over year. Saving 20% on bandwidth is like waiting 1/2 a year and getting your new bandwidth prices.
Bandwidth is rarely limited, only more expensive then choosing your encoding. Spend an extra $5k once in electricity encoding using a better codec, save $50k/month in bandwidth. But I do agree with your statement, why not save if you can?
You're missing it. In history, if someone had "$1mil" it was in gold or silver or something actually worth something. That does not hold true anymore. In this context being discussed, someone with $1mil "in the bank" just means someone that has the number 1,000,000 strored in a database somewhere or possibly a bunch of worthless pieces of paper with $100 printed on them..
The value of money is driven by the typical person's ability to spend it. If 9 out of 10 people have 0 dollars, then the $1mil the 1 person has is worthless.
Robot time is not values, only human time is. As humans get removed from the cost of producing stuff, stuff will get cheaper. It's the transition that hurts. At some point in the far future, almost everyone will be unemployed. One of two extreme things will happen. 1) Everything will work out fine 2) Everything will go horribly horribly wrong destroying society
During this transition people getting laid off are getting the worst of both worlds, still needing money while having difficulty finding work. As we get further along, acquiring money while unemployed will become easier otherwise crime will rise as people get desperate. The only thing more expensive than welfare is sticking people in jail/prison for crime. Not will all of these people in prison cost large amounts of money, but also create negative value, reducing the value of money, causing greater inflation.
Streaming issues were regional and consistent within the region. Your are may not have ever been affected, but other areas were and had the issue for months or years at a time.
I wish torrent was better at taking advantage of my fast symmetrical connection. If more users uploaded to me instead of to others, I could upload more to others and it would be overall faster for everyone. Easier said than done, I know. Simple attempts to do this would result in gaming the system or DOSing the system by wasting other's upload bandwidth.
The big deal is that Verizon has a history of playing games with Netflix data and trying to charge them exorbitant transit prices for local peering data. It's quite telling how horrible Verizon has been about Netflix that they only have 2 devices in their network. With how large Verizon is, I assume Netflix would love to load test their 80Gb/s boxes in a FIOS region.
My understanding of the SSD writing issue is SSDs have a sizable amount of reserved space for wear leveling. Assuming this space is large enough, garbage collection can be postponed as long as there is some spare room. Many SSDs only do GC while the drive is "idle" by some definition, which many times is how many milliseconds since the last request. If the drive never becomes idle, the SSD will continue to postpone the GC until it absolutely has to, as which point it stops everything and does GC.
Insult to injury is that the device waited so long that there is more work meaning it will take even longer, sometimes many seconds. The good news is that SSDs have very reliable performance characteristics, and as the available wear-leveling free-space starts to get consumed by deferred GC work, writes start taking longer because it gets harder to find fresh blocks. This "longer" can be detected because both reads and writes suffer since you can't read from the same part of the drive that is being written to until it has finished.
Read latency is pretty much constant, so any increase in latency means the firmware is doing work. Detecting this increase and backing off, primarily on writes, gives the GC time to do its job and minimizing GC latency even if at the expense of throughput. FreeBSD is also getting new limiting features to their IO layer that will allow admins to limit read IOPs, write IOPs, read throughput, and write throughput all separately. This can be configured system wide per block device and/or by jail per device.
USA peak bandwidth was about 60Tb/s back in 2012 and average bandwidth growth has been at least 50% year or over and steady since the 80s, giving us 400% growth of bandwidth, or about 300Tb/s. Netflix represents about 1/3rd of peak bandwidth, or 100Tb/s. That gives an average of 21Gb/s per server, which sounds ballpark correct seeing that they've moving to 40Gb and 80Gb/s uplinks on their servers.
Regardless of how many people are actually watching, 20Gb/s average is pretty cool. Another interesting note is Netflix servers barely benefit from caching data in memory. Each server is handling to many requests per second from so many different customers, almost no customers are at the same point in the same show, and requests from a customers are temporally far away from each other that almost all requests are just random access. It's also interesting to know that Netflix is beyond the 80/20 rule, they're in the 90/10 rule, in that 10% of their data represents 90% of their requests. Predicting which 10% is important, and they can't use normal evict least-used algorithms because that would cause cache-thrashing. They algorithmically predict what will be watched every night, upload the data to be cached and logically "pin it" so it doesn't get evicted.
Other interesting stuff that they support for syncing the servers is each server can be configured to use a different route to pull down its data and even configure the amount of bandwidth, then the servers within a local can sync with each-other with a kind of P2P setup. This helps load balance routes. Their SSD servers hold quite a bit less storage than mech-drive storage, so the SSDs typically are hit first, but hold only the most requested of data. Last I knew, their SSD servers did not support acting as a cache while loading, because of IO patterns that didn't play well with SSDs with mixed sustained heavy reads and writes. They may have changed or may be changing in the near future. I know the biggest reason for this was the way most SSD firmware supported garbage collections could cause long pauses of no activity with sustained heavy writes. One of the changes was for FreeBSD to have a target latency for reads/writes and throttle the writes until latency came down.
Skylake doesn't "need" special support, unless you want to take advantage of it's special clocking ability, which makes it more responsive. Normally the OS tells the CPU what speed to run at, but the OS can only update this on context switch, which can take several milliseconds per change and many changes to ramp-up the frequency. When the CPU controls itself, it can change frequency in response to load up to 2x faster. For long sustained tasks, this shows up as a about 2% increase in performance, but for short bursty tasks, this shows up as a 25% improvement, all the while only consuming about 0.8% more power under load.
Summary based on benchmarks:
1) Makes the CPU 25% "faster" for very short lived workloads by quickly ramping up from idle
2) Makes the CPU 2% faster for sustained workloads
3) Only consumes 0.8% more power under load and saves power for the short lived loads by completing them more quickly.
If they make changes, it's no longer my code. I don't see the issue. This is the way I see it
GPL: Forcing riff-raff to contribute back
BSD: Make the world a better place by sharing
The only good code is code given willingly. GPL is made by forced labor in North Korean sweat shops and BSD is made by freedom loving hippies.
If you have a single process using more than 2-4 threads per core, you're a horrible programmer. I have a few application that use more than 1,000 threads and my quad core stays about 5%. I had to contact those devs and tell them to start doing some async programming.
That's why the many core server CPUs have massive L3 caches and quad channel memory. 24 core x86 CPU with around 60MiB of L3 cache? Why not? More memory channels allow more concurrency of access. Intel NICs support written packets directly to L3 cache as to skip memory writes. Large on NIC buffers to make better use of DMA collecting and reduce memory operations, transferring in larger chunks to make use of that high bandwidth memory.
In case it's not clear, I'm not trying to say your point isn't valid, just saying your point explains a lot of current features in high end components.
NUMA comes to mind but it has complexity issues added to the OS and application. Accessing another CPU's memory is expensive, so the OS needs to try to keep the threads close to the data. The applications need to try to do cross socket communication by using a system API, assuming it exists, to find out which socket the thread is on and trying best to limit to current socket threads. Cross socket communication is probably best done via passing messages instead of reference because copying and storing the data locally, even if duplicated, will be faster than constantly accessing the other socket's memory.
Then you have the issue of load balancing memory allocations. May not always be an issue but it can become an issue if you consume all of one socket's memory. There are other issues like one socket may have direct access to the NIC while the other socket has direct access to the harddrives. Topology is important.
As soon as you step out of a cache-coherent system, then you run into even more fun problems. Stale data is a huge issue. You need to make sure you're always getting the current data and not some local copy. At the same time, without cache-coherency, cross core communication is very high latency. Most x86 CPUs can remain cache-coherent into single cycle latencies. While copying the data may not be any faster, you know if the data changed very very quickly. If the data is read a lot and rarely changed, then you have some nice guarantees about how quickly you know if the data changed and only incur the access cost when that event happens. Without coherency , you are now forced to check out to memory every time, incurring high latency costs every access.
With multi-core systems, cache-coherency has an N^2 problem. I'm sure someone will come up with an idea of "channels" to facilitate low latency inter-core communication while allowing normal memory access to be separate. Possibly even islands of cache-coherency in an ocean of cores. Each island can be a small group. Some of the many-core designs where they have 80+ cores have heavy locality issues. Adjacent cores are fast to access and far aware cores are very expensive. Pretty much think of each core only able to talk to adjacent cores, and requests to far away cores need to go many "hops". Even worse is cores physically nearer the memory controller have faster access to the memory. All memory requests have to go through these cores. Lots of fun issues that requires custom OS designs.
I've managed super-linear a few times with multi-threading. Required good use of cache. If you can get the threads to be pseudo-synchronized without having to use any actually synchronization, what the first thread reads from memory, the other threads can benefit from. This case only applies to cores that share the same cache. The "super-linear" part no longer applied adding more sockets/CPUs, and adding more cores had diminishing returns, approaching a fixed percentage increase in performance over a single thread.
Then I tell people I code in C# and they don't understand how someone who writes in a high level language know how to think so low level. Lets just say I'm that go-to guy when you can't empirically find why your code is slow. Many hard performance issues cannot be measured because measuring can change the outcome. At that point you need a good mental model of how CPUs, cache, memory, OSes, threads schedulers, io schedulers, harddrives, SSDs, and networks interact to produce strange slowness when no one part is the bottleneck. Almost always an issue of latency vs throughput and different parts of the system with different throughput or latency characteristics.
$180 PSU, $150 mobo, $150 memory, $400 few SSDs, $60 case, $200 monitor, $300 GPU, $70 Intel network card. Not to mention the $30 each for mag-lev bearing fans. Yep, I really want to save $50 on a CPU with heat and power issues.
I came from a poor family, I had to earn my own money to buy computer parts when I was a child. I've learned to appreciate quality. If AMD can get within 10% of Intel in performance per core and efficiency, I will support the underdog. I really want a bunch of cores and ECC memory on my desktop, but Xeons are too expensive.
Been building and repairing computers for 25+ years and have worked in IT for quite a few. I have never seen a harddrive die from power issues. I have seen burnt motherboards, and melted traces where the power comes in, but never had an HD die from a surge or lightning strike. Pretty much only unexpected shutdowns in need of a scandisk. I have seen drives die for a myriad of other reasons.
How common are surge/lightning/PSU-blow-up HD deaths? My limited experience is "not often" since I've never seen one.
I wonder if the lost reduction going from 8bit to 10bit is most data isn't actually sourced as 10bit and those extra bits are just noise, bloating the end size.
Umm, no. If you click on the link and look at the first graph, it clearly shows VP9 is about 17% better than 264 at 360p, 32% better at 720p, and 43% better at 1080p. VP9 would be a win over 264, but 265 is an additional 20% more.
Bandwidth is pretty cheap. It only represents a few pennies on your $10/m Netflix bill. H.265 would not only have to save money, but save enough money to make the red-tape of licensing worth it. And once you become dependent on it, they may raise the fees, uncertainty. The cost of raw bandwidth drops nearly 50% year over year. Saving 20% on bandwidth is like waiting 1/2 a year and getting your new bandwidth prices.
Bandwidth is rarely limited, only more expensive then choosing your encoding. Spend an extra $5k once in electricity encoding using a better codec, save $50k/month in bandwidth. But I do agree with your statement, why not save if you can?
Of course you did. Nearly all CPUs and GPU support accelerating x264, but not x265.
You're missing it. In history, if someone had "$1mil" it was in gold or silver or something actually worth something. That does not hold true anymore. In this context being discussed, someone with $1mil "in the bank" just means someone that has the number 1,000,000 strored in a database somewhere or possibly a bunch of worthless pieces of paper with $100 printed on them..
The value of money is driven by the typical person's ability to spend it. If 9 out of 10 people have 0 dollars, then the $1mil the 1 person has is worthless.
Robot time is not values, only human time is. As humans get removed from the cost of producing stuff, stuff will get cheaper. It's the transition that hurts. At some point in the far future, almost everyone will be unemployed. One of two extreme things will happen. 1) Everything will work out fine 2) Everything will go horribly horribly wrong destroying society
During this transition people getting laid off are getting the worst of both worlds, still needing money while having difficulty finding work. As we get further along, acquiring money while unemployed will become easier otherwise crime will rise as people get desperate. The only thing more expensive than welfare is sticking people in jail/prison for crime. Not will all of these people in prison cost large amounts of money, but also create negative value, reducing the value of money, causing greater inflation.
No, this was released with Skylake, Intel was advertising it all over the place and saying how only Windows 10 would support it because it's so new.
Streaming issues were regional and consistent within the region. Your are may not have ever been affected, but other areas were and had the issue for months or years at a time.
I wish torrent was better at taking advantage of my fast symmetrical connection. If more users uploaded to me instead of to others, I could upload more to others and it would be overall faster for everyone. Easier said than done, I know. Simple attempts to do this would result in gaming the system or DOSing the system by wasting other's upload bandwidth.
The big deal is that Verizon has a history of playing games with Netflix data and trying to charge them exorbitant transit prices for local peering data. It's quite telling how horrible Verizon has been about Netflix that they only have 2 devices in their network. With how large Verizon is, I assume Netflix would love to load test their 80Gb/s boxes in a FIOS region.
My understanding of the SSD writing issue is SSDs have a sizable amount of reserved space for wear leveling. Assuming this space is large enough, garbage collection can be postponed as long as there is some spare room. Many SSDs only do GC while the drive is "idle" by some definition, which many times is how many milliseconds since the last request. If the drive never becomes idle, the SSD will continue to postpone the GC until it absolutely has to, as which point it stops everything and does GC.
Insult to injury is that the device waited so long that there is more work meaning it will take even longer, sometimes many seconds. The good news is that SSDs have very reliable performance characteristics, and as the available wear-leveling free-space starts to get consumed by deferred GC work, writes start taking longer because it gets harder to find fresh blocks. This "longer" can be detected because both reads and writes suffer since you can't read from the same part of the drive that is being written to until it has finished.
Read latency is pretty much constant, so any increase in latency means the firmware is doing work. Detecting this increase and backing off, primarily on writes, gives the GC time to do its job and minimizing GC latency even if at the expense of throughput. FreeBSD is also getting new limiting features to their IO layer that will allow admins to limit read IOPs, write IOPs, read throughput, and write throughput all separately. This can be configured system wide per block device and/or by jail per device.
USA peak bandwidth was about 60Tb/s back in 2012 and average bandwidth growth has been at least 50% year or over and steady since the 80s, giving us 400% growth of bandwidth, or about 300Tb/s. Netflix represents about 1/3rd of peak bandwidth, or 100Tb/s. That gives an average of 21Gb/s per server, which sounds ballpark correct seeing that they've moving to 40Gb and 80Gb/s uplinks on their servers.
Regardless of how many people are actually watching, 20Gb/s average is pretty cool. Another interesting note is Netflix servers barely benefit from caching data in memory. Each server is handling to many requests per second from so many different customers, almost no customers are at the same point in the same show, and requests from a customers are temporally far away from each other that almost all requests are just random access. It's also interesting to know that Netflix is beyond the 80/20 rule, they're in the 90/10 rule, in that 10% of their data represents 90% of their requests. Predicting which 10% is important, and they can't use normal evict least-used algorithms because that would cause cache-thrashing. They algorithmically predict what will be watched every night, upload the data to be cached and logically "pin it" so it doesn't get evicted.
Other interesting stuff that they support for syncing the servers is each server can be configured to use a different route to pull down its data and even configure the amount of bandwidth, then the servers within a local can sync with each-other with a kind of P2P setup. This helps load balance routes. Their SSD servers hold quite a bit less storage than mech-drive storage, so the SSDs typically are hit first, but hold only the most requested of data. Last I knew, their SSD servers did not support acting as a cache while loading, because of IO patterns that didn't play well with SSDs with mixed sustained heavy reads and writes. They may have changed or may be changing in the near future. I know the biggest reason for this was the way most SSD firmware supported garbage collections could cause long pauses of no activity with sustained heavy writes. One of the changes was for FreeBSD to have a target latency for reads/writes and throttle the writes until latency came down.
Skylake IGP is about the same performance as AMD's current IGPs and $70 discreet GPUs.
Skylake doesn't "need" special support, unless you want to take advantage of it's special clocking ability, which makes it more responsive. Normally the OS tells the CPU what speed to run at, but the OS can only update this on context switch, which can take several milliseconds per change and many changes to ramp-up the frequency. When the CPU controls itself, it can change frequency in response to load up to 2x faster. For long sustained tasks, this shows up as a about 2% increase in performance, but for short bursty tasks, this shows up as a 25% improvement, all the while only consuming about 0.8% more power under load.
Summary based on benchmarks:
1) Makes the CPU 25% "faster" for very short lived workloads by quickly ramping up from idle
2) Makes the CPU 2% faster for sustained workloads
3) Only consumes 0.8% more power under load and saves power for the short lived loads by completing them more quickly.
Apple contributes back to BSD more than GPL does. GPL is worse than the thiefs they claim to protect against.
If they make changes, it's no longer my code. I don't see the issue. This is the way I see it
GPL: Forcing riff-raff to contribute back
BSD: Make the world a better place by sharing
The only good code is code given willingly. GPL is made by forced labor in North Korean sweat shops and BSD is made by freedom loving hippies.
If you have a single process using more than 2-4 threads per core, you're a horrible programmer. I have a few application that use more than 1,000 threads and my quad core stays about 5%. I had to contact those devs and tell them to start doing some async programming.
That's why the many core server CPUs have massive L3 caches and quad channel memory. 24 core x86 CPU with around 60MiB of L3 cache? Why not? More memory channels allow more concurrency of access. Intel NICs support written packets directly to L3 cache as to skip memory writes. Large on NIC buffers to make better use of DMA collecting and reduce memory operations, transferring in larger chunks to make use of that high bandwidth memory.
In case it's not clear, I'm not trying to say your point isn't valid, just saying your point explains a lot of current features in high end components.
Also, multicore designs can have separate memory.
NUMA comes to mind but it has complexity issues added to the OS and application. Accessing another CPU's memory is expensive, so the OS needs to try to keep the threads close to the data. The applications need to try to do cross socket communication by using a system API, assuming it exists, to find out which socket the thread is on and trying best to limit to current socket threads. Cross socket communication is probably best done via passing messages instead of reference because copying and storing the data locally, even if duplicated, will be faster than constantly accessing the other socket's memory.
Then you have the issue of load balancing memory allocations. May not always be an issue but it can become an issue if you consume all of one socket's memory. There are other issues like one socket may have direct access to the NIC while the other socket has direct access to the harddrives. Topology is important.
As soon as you step out of a cache-coherent system, then you run into even more fun problems. Stale data is a huge issue. You need to make sure you're always getting the current data and not some local copy. At the same time, without cache-coherency, cross core communication is very high latency. Most x86 CPUs can remain cache-coherent into single cycle latencies. While copying the data may not be any faster, you know if the data changed very very quickly. If the data is read a lot and rarely changed, then you have some nice guarantees about how quickly you know if the data changed and only incur the access cost when that event happens. Without coherency , you are now forced to check out to memory every time, incurring high latency costs every access.
With multi-core systems, cache-coherency has an N^2 problem. I'm sure someone will come up with an idea of "channels" to facilitate low latency inter-core communication while allowing normal memory access to be separate. Possibly even islands of cache-coherency in an ocean of cores. Each island can be a small group. Some of the many-core designs where they have 80+ cores have heavy locality issues. Adjacent cores are fast to access and far aware cores are very expensive. Pretty much think of each core only able to talk to adjacent cores, and requests to far away cores need to go many "hops". Even worse is cores physically nearer the memory controller have faster access to the memory. All memory requests have to go through these cores. Lots of fun issues that requires custom OS designs.
I've managed super-linear a few times with multi-threading. Required good use of cache. If you can get the threads to be pseudo-synchronized without having to use any actually synchronization, what the first thread reads from memory, the other threads can benefit from. This case only applies to cores that share the same cache. The "super-linear" part no longer applied adding more sockets/CPUs, and adding more cores had diminishing returns, approaching a fixed percentage increase in performance over a single thread.
Then I tell people I code in C# and they don't understand how someone who writes in a high level language know how to think so low level. Lets just say I'm that go-to guy when you can't empirically find why your code is slow. Many hard performance issues cannot be measured because measuring can change the outcome. At that point you need a good mental model of how CPUs, cache, memory, OSes, threads schedulers, io schedulers, harddrives, SSDs, and networks interact to produce strange slowness when no one part is the bottleneck. Almost always an issue of latency vs throughput and different parts of the system with different throughput or latency characteristics.
$180 PSU, $150 mobo, $150 memory, $400 few SSDs, $60 case, $200 monitor, $300 GPU, $70 Intel network card. Not to mention the $30 each for mag-lev bearing fans. Yep, I really want to save $50 on a CPU with heat and power issues.
I came from a poor family, I had to earn my own money to buy computer parts when I was a child. I've learned to appreciate quality. If AMD can get within 10% of Intel in performance per core and efficiency, I will support the underdog. I really want a bunch of cores and ECC memory on my desktop, but Xeons are too expensive.
The main problem Intel has is that CPUs have not had significant speed improvements for years and that the high prices Intel asks
And yet Intel is still the fastest and best value. Here's hoping for some competition.
Been building and repairing computers for 25+ years and have worked in IT for quite a few. I have never seen a harddrive die from power issues. I have seen burnt motherboards, and melted traces where the power comes in, but never had an HD die from a surge or lightning strike. Pretty much only unexpected shutdowns in need of a scandisk. I have seen drives die for a myriad of other reasons.
How common are surge/lightning/PSU-blow-up HD deaths? My limited experience is "not often" since I've never seen one.