BBC Optimizing UHD Video Streaming Over IP (bbc.co.uk)
johnslater writes: A friend at the BBC has written a short description of his project to deliver UHD video over IP networks. The application bypasses the OS network stack, and constructs network packets directly in a buffer shared with the network hardware, achieving a ten-fold throughput improvement. He writes: "Using this technique, we can send or receive uncompressed UHD 2160p50 video (more than 8 Gbps) using a single CPU core, leaving all the rest of the server's cores free for video processing." This is part of a broader BBC project to develop an end-to-end IP-based studio system.
Kernel bypass plus zero copy are, of course, old-hat. Worked on such stuff at Lightfleet, back when it did this stuff called work. Infiniband and the RDMA Consortium had been working on it for longer yet.
What sort of performance increase can you achieve?
Well, Ethernet latencies tend to run into milliseconds for just the stack. Tens, if not hundreds, of milliseconds for anything real. Infiniband can achieve eight microsecond latencies. SPI can get down to two milliseconds.
So you can certainly achieve the sorts of latency improvements quoted. It's hard work, especially when operating purely in software, but it can actually be done. It's about bloody time, too. This stuff should have been standard in 2005, not 2015! Bloody slowpokes. Back in my day, we had to shovel our own packets! In the snow! Uphill! Both ways!
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
If people would just accept a decent MTU none of this would matter.
The max is 64 K but we're stuck with 1500 (including overhead) because you can't be sure that every hop will support your MTU.
Internally you can enable jumbo frames and shit will work, but once you need to go out over the internet all bets are off, so you limit your shit to 1500 and your performance goes to all hell.
We're basically delivering UHD movies via telegram.
https://birds-are-nice.me/musi...
I show how the concept of the public domain has been crushed by demonstrating just how little popular music exists in it.
IP packets and Ethernet frames are really quite small data structures, so the cost of processing is vastly overshadowed by the cost of scheduling, context switching, memory management, etc.. Because modern CPUs are limited by RAM access latency, they're only fast when they can either loop or stream. If there isn't enough of that between the overhead, performance tanks.
How about the BBC stop requiring Flash for videos. That would be a better place to start.
to be able to watch Eastenders in Ultra HD...
To the Beeb's credit though, the Sky at Night in UHD would definitely be a lot more interesting, surely. But out of thousands of mediocre shows and movies released year after year after year, is it worth buying a new tv to marvel at a dozen really good programs? Somehow this don't seem to be a good value proposition.
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
In order to pick up UHD I need to connect that round antenna to the back of the TV, right?
Congratulations BBC, you managed to re-invent DPDK at the license payers expense.
A 10x difference in performance is not only attainable, but even faster is being done.
At 10Gb/s, the amount of data getting shuffled around in a normal network stack is enough to push the limits of the databuses. Most network stacks copy the data something like 4 times. That works as a multiplier and changes 10Gb/s into 40Gb/s. Context switching causes cache trashing and can consume more cycles than the actual data getting processed. A single context switch can consume about 1,000 cycles on a modern CPU.
why not do 720p for everybody first, with no region locking or lockouts? fuck this uber hq shit. dvd quality is, quite frankly, good enough for all but the most anal of viewers.
Shush! If UHD doesn't take, we'll be forever stuck with 1080p computer monitors. Do not be the person who prevented 8 MPixels desktop monitors from becoming mainstream.
Dear Sirs,
I'm the head of the UHD Panel Manufacturers' Association. I'm sorry to say that, having read the grandparent post by an Anonymous Coward, our members have unanimously decided to cancel all further development and manufacturing of ludicrously high-definition panels, and to shut down the association. In fact, we've decided to stop bothering with 1080p panels as well and in future will just be selling 1280 x 720 displays.
The parent was correct in identifying the influence of a single post by an Anonymous Coward on a niche website; we consider Slashdot the be-all-and-end-all of our market research and as a result of that post we're ending our involvement in a potential multi-billion dollar industry.
In fact, we're even considering going back to CRT displays, you ungrateful wretches.
Yours sincerely,
Hwang Beom-seok,
President of the UHD Panel Manufacturers Association
40Gb is ~4GBps which is a fraction of what any bus can handle on a modern x64 processor. It's 1/10th the bandwidth of dual channel DDR3-1600 memory which is the slowest a Skylake processor goes. It's 4 lanes of PCIe 3.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
now I can watch reruns of top gear and star trek TNG in UHD
Standard linux distros support timestamping of the packet by the kernel when the packet is received. When userspace reads the packet it can also obtain the kernel timestamp of that packet.
You over-simplified throughput. There are latency issues that reduce the effective throughput well below the maximum. Zero-copy along with reducing context switching is very important to 10Gb+ rates. Netmap is one such project. It allows userland to send packets 7x faster single theaded than kernel mode with the old network stack, and even better multi-threaded.
FreeBSD is working on a new API to allow the network stack to work along with the network card such that the CPU-core that gets interrupted by the NIC will also be the core that processes the packet in the firewall and also to notify the userland on that same core. Once userland, kernel, and NIC all use the same cpu-core, less inter-core data-copying will occur. Right now it's up to the thread scheulder to decide where the packets get processed. The NIC may interrupt Core 0, then Core 1 processes the packet, the Core 2 is where the userland reads the packet. That's a ton of copying, and that's not even including the 3-4x copying within the network stack.
Intel's DPDK library is specifically built to bypass the OS and provide high-speed low-latency networking. Sems like a natural fit.
How? Please explain.
I would expect that bypassing the network stack is no small feat.
- Zav - Imagine a Beowulf cluster of insensitive clods...