BBC Optimizing UHD Video Streaming Over IP (bbc.co.uk)

← Back to Stories (view on slashdot.org)

BBC Optimizing UHD Video Streaming Over IP (bbc.co.uk)

Posted by Soulskill on Friday October 9, 2015 @06:25PM from the go-big-or-go-home dept.

johnslater writes: A friend at the BBC has written a short description of his project to deliver UHD video over IP networks. The application bypasses the OS network stack, and constructs network packets directly in a buffer shared with the network hardware, achieving a ten-fold throughput improvement. He writes: "Using this technique, we can send or receive uncompressed UHD 2160p50 video (more than 8 Gbps) using a single CPU core, leaving all the rest of the server's cores free for video processing." This is part of a broader BBC project to develop an end-to-end IP-based studio system.

45 of 72 comments (clear)

Min score:

Reason:

Sort:

Interesting by jd · 2015-10-09 19:20 · Score: 5, Interesting

Kernel bypass plus zero copy are, of course, old-hat. Worked on such stuff at Lightfleet, back when it did this stuff called work. Infiniband and the RDMA Consortium had been working on it for longer yet.
What sort of performance increase can you achieve?
Well, Ethernet latencies tend to run into milliseconds for just the stack. Tens, if not hundreds, of milliseconds for anything real. Infiniband can achieve eight microsecond latencies. SPI can get down to two milliseconds.
So you can certainly achieve the sorts of latency improvements quoted. It's hard work, especially when operating purely in software, but it can actually be done. It's about bloody time, too. This stuff should have been standard in 2005, not 2015! Bloody slowpokes. Back in my day, we had to shovel our own packets! In the snow! Uphill! Both ways!

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
1. Re: Interesting by tomalpha · 2015-10-09 21:06 · Score: 3, Informative
  
  Yeah, this kind of thing has been around for a while.
  These days the added latency of going through the kernel IP stack is generally measured in micro rather than milliseconds but the difference is still the same order of magnitude. Solarflare, Mellanox and others will happily sell you expensive Ethernet network cards that come bundled with drivers that let you bypass the kernel IP stack. The stack itself isn't especially slow but the system call and extra memcpys still do all add up. I've also seen an in-house user space stack built largely on top of lwIP.
  So I'd agree that none of this particularly new, but I reckon it's still interesting that the BBC is using it. Maybe that'll help spur more widespread adoption.
2. Re:Interesting by ihtoit · 2015-10-09 21:38 · Score: 2
  
  70ns for a signal to propagate over 10m of twisted pair copper. Start there.
  I have Gigabit connection over my LAN and I still get...lemme test... 1ms to the router, 2ms to a random other machine. Over WAN, 109ms to Slashdot, 16ms to the BBC. World of Tanks EU server goes between 40-130ms, depending on how busy it is and whether my son is video skyping....
  
  --
  Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
3. Re:Interesting by bernywork · 2015-10-09 23:31 · Score: 1
  
  I'm guessing your using standard ping there, well, the problem is that the packet being generated and the time sent and received times are coming from timers most likely in the app itself, it's doing the calculation, so if you ask the system for time 1 and it goes "00:00:00:00" and you ask for the time again and it says "00:00:00:01" it'll get reported at 1ms, but the packet may have entered the system a lot faster than that, it's only because you're using a 1ms accuracy time stamp that you're getting 1ms. Also, if you ask for a timestamp and the system takes a long time to respond to that request, you're timestamps are going to be out again.
  Accurately measuring all this stuff, there's whole sections of the networking industry built around it.
  
  --
  Curiosity was framed; ignorance killed the cat. -- Author unknown
4. Re:Interesting by Bengie · 2015-10-10 03:14 · Score: 1
  
  My pings on home connection
  
  According to my switch a 64byte frame is 0.0023ms(2.3us) port to port
  According to a research paper, 1Gb Ethernet over 1km of fiber is 0.01476ms(14.67us) and 10Gb Ethernet is 0.0056ms(5.6us), one way, not RTT
  Desktop to Router through switch 0.12ms(120us) as measured in Windows via hrping
  Akamai CDN in ISP 1.25ms
  ISP DHCP server 1.5ms
  Chicago 6ms
  Slashdot 6ms
  Minneapolis 7ms
  New York City 30ms
  Atlanta 30ms
  Miami 40ms
  Houston 45ms
  San Jose 60ms
  San Francisco 65ms
  Seattle 70ms
  London 90ms
  France 90ms
  Frankfurt 110ms
  Stockholm 120ms
  Hawaii 140ms
  Tokyo 160ms
  Moscow 160ms
  Sydney 180ms
5. Re:Interesting by Bengie · 2015-10-10 03:22 · Score: 1
  
  Of course I just re-ran hrping against my router and got a min ping of 0.029ms(29us) with a std dev of 0.229ms(229us)
6. Re:Interesting by Bengie · 2015-10-10 04:59 · Score: 1
  
  The two are highly related in this context. Latency is caused by additional copying which is directly proportional to the amount of work being done. Additional work means increased latencies which means reduced throughput.
7. Re:Interesting by ihtoit · 2015-10-10 21:09 · Score: 1
  
  yep, I used the shell "Ping" command (Win7HP). I have no clue as to the inner mechanics.
  
  --
  Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
8. Re:Interesting by azav · 2015-10-11 03:58 · Score: 1
  
  Care to explain a little of the overall design?
  
  --
  - Zav - Imagine a Beowulf cluster of insensitive clods...
9. Re:Interesting by azav · 2015-10-11 04:03 · Score: 1
  
  Yeah, you just keep spouting off your unused testosterone.
  We'll learn from gramps and profit from his experience. Just keep mouthing off to people who have accomplished things before your time instead of learning from them. I'm sure some one will realize your greatness some day and place the crown on your head as you truly deserve.
  
  --
  - Zav - Imagine a Beowulf cluster of insensitive clods...
MTU by sexconker · 2015-10-09 19:26 · Score: 1

If people would just accept a decent MTU none of this would matter.
The max is 64 K but we're stuck with 1500 (including overhead) because you can't be sure that every hop will support your MTU.
Internally you can enable jumbo frames and shit will work, but once you need to go out over the internet all bets are off, so you limit your shit to 1500 and your performance goes to all hell.
We're basically delivering UHD movies via telegram.
1. Re:MTU by Zarhan · 2015-10-09 19:42 · Score: 1
  
  The use case here is moving uncompressed video within a studio environment. In here, you have full control over the hardware and Internet does not come into play. I'd think that in such cases they have no problems in going to jumbo frames.
2. Re:MTU by Anonymous Coward · 2015-10-09 20:41 · Score: 1
  
  Jumbo frames play very badly when you have other stuff going over the same link though. Each connection can't send a packet until the previous one has finished sending, and those gaps are much further apart when using jumbo frames. Yes it does improve throughput, but only when you're using it for approximately one thing.
3. Re:MTU by FireFury03 · 2015-10-09 22:18 · Score: 1
  
  If people would just accept a decent MTU none of this would matter.
  The max is 64 K but we're stuck with 1500 (including overhead) because you can't be sure that every hop will support your MTU.
  Internally you can enable jumbo frames and shit will work, but once you need to go out over the internet all bets are off, so you limit your shit to 1500 and your performance goes to all hell.
  We're basically delivering UHD movies via telegram.
  Packet size is a tradeoff - for high throughput you want big packets, for low latency you want small packets. So fine, just tailor the packet size to your application - well no, when you're sharing a network, the packet sizes used by other applications have a significant impact.
  So lets say you're doing something that requires a low latency, such as VoIP. And lets say you've got QoS set up to ensure the small VoIP packets are always inserted in front of any big packets, since that's a sensible thing to do. Look at 2 scenarios:
  Scenario 1:
  Transmit queue is empty, VoIP packet goes straight to the network card.
  Scenario 2:
  Transmit queue has a bunch of packets already in it. The VoIP packet goes straight to the head of the queue, but the ethernet card has already started transmitting another packet, so we have to let that finish before the VoIP packet can actually go out onto the network.
  On a busy system, scenario 2 would be the norm, so the latency of the VoIP traffic will vary and the receiving end has to even out this latency with a jitter buffer. Lets assume an MTU of 1500 - the transmitting side has only just started transmitting a 1500 byte packet when the VoIP packet enters the queue, on a 2Mbps connection it would take 7.5ms to send this packet before the VoIP packet can start to be transmitted, so you're looking at a 7.5ms jitter on your VoIP session. If the MTU was 64K, the jitter would be a whopping 328ms, which is verging on unusable for VoIP.
  Now, you may say that 2Mbps is a slow internet connection, and you'd be right, but it is also a very common speed of internet connection, so doing stuff that breaks it would be bad. Don't forget that you get latency introduced for each hop you do through though - on a 100Mbps connection with a 64K MTU you add up to 6.6ms of latency per hop, so if your traffic goes through 10 100Mbps hops, you're looking at potentially 66ms of latency.
  Ideally you'd set the MTU of each interconnect independently of the rest of the network and base it on the jitter level you'd like to achieve (therefore it would be based on that link speed). And indeed this can be done - clients can do path MTU discovery to figure out the minimum MTU on the route between hosts, irrespective of the local MTU. Unfortunately, too many idiot sysadmins set up firewalls to block ICMP packets and that breaks PMTU discovery. Which means that if you're using a "nonstandard" MTU (i.e. not 1500) you _will_ have connectivity problems because your traffic will sometimes traverse firewalls that are set up by said idiots.
  
  --
  http://blog.nexusuk.org
4. Re:MTU by Bengie · 2015-10-10 05:04 · Score: 1
  
  A 64KiB packets on a 10Mb/s connection is about 5ms. That's a huge amount of jitter. 64KiB packets may be acceptable for 10Gb connections, but I like to keep my connection below 1ms of jitter. To give an idea of how horrible 5ms of jitter is, I get about 2ms of jitter from Midwest USA to Frankfurt Germany.
5. Re:MTU by Bengie · 2015-10-10 05:11 · Score: 1
  
  sexconker's argument was that the Internet should have jumbo frames. Grats on changing the context of the argument.
6. Re:MTU by Mandrel · 2015-10-10 05:34 · Score: 1
  
  Packet size is a tradeoff - for high throughput you want big packets, for low latency you want small packets.
  There'd be no such trade-off if routers and computers pipelined packets, starting (or queuing) to forward as soon as the destination IP address is read and an interface route determined, possibly also waiting to check the TCP/IP header checksum.
7. Re:MTU by sexconker · 2015-10-10 15:01 · Score: 1
  
  64 * 1024 * 8 / 2 / 1000 / 1000 = 262 ms worst case, not 328 ms.
  And routers should know the capability of the links and can split up the jumbo frames into multiple packets to let VOIP through ahead without wasting much bandwidth at all. Hell, my shitty D-Link does this - every boot it scans the link to determine connection speed and uses that in its QoS engine.
  Further, the use case in the article is 8 Gbps in a studio environment. They can dedicate the entire link to video. 8 Gbps down a 2 Mbps pipe is never going to happen, so your example is ridiculous on the face of it. They're claiming a ten-fold improvement. So how about an MTU of 16K instead of 1500 or 64K?
  Worst case is 66 ms additional delay on your 2 Mbps link.
8. Re:MTU by sexconker · 2015-10-10 15:10 · Score: 1
  
  And for a 10 Mbps connection you drop the max MTU and split the packets. Routers in the middle of a path can do this.
  Video Streaming Service A sends a 64 KB packet to ISP B over a 100 Mbps link, ISP B knows Customer C is on the Shit Tier package and can handle 10 Mbps, and decides to split up the 64 KB packet into 4 KB or whatever packets, Customer C gets their shit.
  4 KB / 10 Mbps 64 KB / 100 Mbps, no additional jitter. Without even inspecting the traffic to see if it's Netflix or Skype or their own VoIP service, they can control jitter to be no more than the source or no more than some baseline acceptable level.
9. Re:MTU by Bengie · 2015-10-10 15:25 · Score: 1
  
  IPv6 does not allow packet fragmentation by the routers. You also have the issue that if a single fragment is dropped, the entire packet must be retransmitted.
I found my own way to protest. by SuricouRaven · 2015-10-09 19:33 · Score: 1

https://birds-are-nice.me/musi...
I show how the concept of the public domain has been crushed by demonstrating just how little popular music exists in it.
1. Re:I found my own way to protest. by SuricouRaven · 2015-10-09 19:46 · Score: 1
  
  Damnit, posted to the wrong story! That was supposed to go to the one about TPP.
2. Re:I found my own way to protest. by wonkey_monkey · 2015-10-09 21:26 · Score: 1
  
  I show how the concept of the public domain has been crushed by demonstrating just how little popular music exists in it.
  Are you sure it wouldn't suck anyway?
  
  --
  systemd is Roko's Basilisk.
3. Re:I found my own way to protest. by SuricouRaven · 2015-10-09 22:08 · Score: 2
  
  I also put the wrong link in.
4. Re:I found my own way to protest. by SuricouRaven · 2015-10-09 22:10 · Score: 1
  
  Musical styles change. Entire genres of music have been invented in the last seventy years, the current copyright term for music here. There's no justification for a duration so long - in what way does it promote the creation of new music? It doesn't.
Re:Credulity by Anonymous Coward · 2015-10-09 20:25 · Score: 2, Informative

IP packets and Ethernet frames are really quite small data structures, so the cost of processing is vastly overshadowed by the cost of scheduling, context switching, memory management, etc.. Because modern CPUs are limited by RAM access latency, they're only fast when they can either loop or stream. If there isn't enough of that between the overhead, performance tanks.
Baby steps. by SeaFox · 2015-10-09 20:40 · Score: 2

How about the BBC stop requiring Flash for videos. That would be a better place to start.
1. Re:Baby steps. by wonkey_monkey · 2015-10-09 21:27 · Score: 2
  
  Okey dokey.
  
  --
  systemd is Roko's Basilisk.
2. Re:Baby steps. by AmiMoJo · 2015-10-10 02:14 · Score: 2
  
  4k is aiming a bit low anyway. NHK, the Japanese equivalent of the BBC, I'd going directly to 8k for the 2020 Olympics. Test broadcasts will begin in 2018, about 2.5 years from now. 4k is going to be short lived.
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
3. Re:Baby steps. by Anonymous Coward · 2015-10-10 03:59 · Score: 1
  
  Meanwhile, 4k is going to be needed (I use the phrase in the loosest possible way) and 2020 is 5 years away. Solving the problems at 4k are steps towards solving them with 8k and faster processors / faster network infrastructure / better equipment / better compression / better decompression.
I can't wait by Rosco+P.+Coltrane · 2015-10-09 22:03 · Score: 2

to be able to watch Eastenders in Ultra HD...
To the Beeb's credit though, the Sky at Night in UHD would definitely be a lot more interesting, surely. But out of thousands of mediocre shows and movies released year after year after year, is it worth buying a new tv to marvel at a dozen really good programs? Somehow this don't seem to be a good value proposition.

--
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
1. Re:I can't wait by SuricouRaven · 2015-10-09 22:20 · Score: 4, Informative
  
  This isn't for you to watch UHD. It's for internal use in production, so they can shunt live UHD video around their studios. That way they keep full quality right up until the final stage before distribution, when it gets resized according to the end device. Your TV will get plain old 1080p as always - but they'll have UHD capability ready to go for transmitting to cinemas or sending to big public displays, and they can archive a UHD version for future use so they can zoom in tighter on the action in future highlights.
2. Re:I can't wait by dinfinity · 2015-10-10 03:13 · Score: 1
  
  Interestingly, the place to look for UHD content is YouTube (and recently Vimeo, as well). The flexibility of the 'amateur' video producer and that of the internet as a distribution platform really show in this area.
  There is some beautiful and awesome stuff out there:
  https://vimeo.com/115541651
  https://www.youtube.com/watch?...
  https://www.youtube.com/watch?...
  Given that the high-end smartphones are outputting UHD movies now as well, there is going to be an onslaught of UHD content.
UHD is that round antenna, right? by Anonymous Coward · 2015-10-09 22:27 · Score: 1

In order to pick up UHD I need to connect that round antenna to the back of the TV, right?
Intel DPDK by yoshac · 2015-10-09 22:41 · Score: 1

Congratulations BBC, you managed to re-invent DPDK at the license payers expense.
Re:Credulity by Bengie · 2015-10-10 02:38 · Score: 2

A 10x difference in performance is not only attainable, but even faster is being done.

At 10Gb/s, the amount of data getting shuffled around in a normal network stack is enough to push the limits of the databuses. Most network stacks copy the data something like 4 times. That works as a multiplier and changes 10Gb/s into 40Gb/s. Context switching causes cache trashing and can consume more cycles than the actual data getting processed. A single context switch can consume about 1,000 cycles on a modern CPU.
Re:instead of investing in super ultra hq video.. by Anonymous Coward · 2015-10-10 02:42 · Score: 3, Funny

why not do 720p for everybody first, with no region locking or lockouts? fuck this uber hq shit. dvd quality is, quite frankly, good enough for all but the most anal of viewers.
Shush! If UHD doesn't take, we'll be forever stuck with 1080p computer monitors. Do not be the person who prevented 8 MPixels desktop monitors from becoming mainstream.
Dear Sirs,
I'm the head of the UHD Panel Manufacturers' Association. I'm sorry to say that, having read the grandparent post by an Anonymous Coward, our members have unanimously decided to cancel all further development and manufacturing of ludicrously high-definition panels, and to shut down the association. In fact, we've decided to stop bothering with 1080p panels as well and in future will just be selling 1280 x 720 displays.
The parent was correct in identifying the influence of a single post by an Anonymous Coward on a niche website; we consider Slashdot the be-all-and-end-all of our market research and as a result of that post we're ending our involvement in a potential multi-billion dollar industry.
In fact, we're even considering going back to CRT displays, you ungrateful wretches.
Yours sincerely,
Hwang Beom-seok,
President of the UHD Panel Manufacturers Association
Re: Credulity by afidel · 2015-10-10 03:47 · Score: 1

40Gb is ~4GBps which is a fraction of what any bus can handle on a modern x64 processor. It's 1/10th the bandwidth of dual channel DDR3-1600 memory which is the slowest a Skylake processor goes. It's 4 lanes of PCIe 3.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
awesome! by Osgeld · 2015-10-10 04:23 · Score: 1

now I can watch reruns of top gear and star trek TNG in UHD
the above is wrong by Chirs · 2015-10-10 04:51 · Score: 1

Standard linux distros support timestamping of the packet by the kernel when the packet is received. When userspace reads the packet it can also obtain the kernel timestamp of that packet.
1. Re:the above is wrong by bernywork · 2015-10-12 23:57 · Score: 1
  
  Sorry, what?
  Even the kernel isn't accurate at doing this. On heavily loaded systems I've seen 20ms wait before a packet is stamped before. Pre-emptive kernels and everything else means that a packet might be sitting on the network card or in a buffer without it being collected and stamped by the system. The only way to have accurate timestamps is to have something like a Napatech or Myricom card using a third party time source.
  
  --
  Curiosity was framed; ignorance killed the cat. -- Author unknown
Re: Credulity by Bengie · 2015-10-10 04:54 · Score: 1

You over-simplified throughput. There are latency issues that reduce the effective throughput well below the maximum. Zero-copy along with reducing context switching is very important to 10Gb+ rates. Netmap is one such project. It allows userland to send packets 7x faster single theaded than kernel mode with the old network stack, and even better multi-threaded.

FreeBSD is working on a new API to allow the network stack to work along with the network card such that the CPU-core that gets interrupted by the NIC will also be the core that processes the packet in the firewall and also to notify the userland on that same core. Once userland, kernel, and NIC all use the same cpu-core, less inter-core data-copying will occur. Right now it's up to the thread scheulder to decide where the packets get processed. The NIC may interrupt Core 0, then Core 1 processes the packet, the Core 2 is where the userland reads the packet. That's a ton of copying, and that's not even including the 3-4x copying within the network stack.
Why aren't they using Intel's DPDK? by Chirs · 2015-10-10 04:56 · Score: 1

Intel's DPDK library is specifically built to bypass the OS and provide high-speed low-latency networking. Sems like a natural fit.
1. Re:Why aren't they using Intel's DPDK? by JohnStock · 2015-10-20 11:17 · Score: 1
  
  What is being "bypassed" to get this extra speed and why isn't it part of the OS by default?
Re:zero copy by azav · 2015-10-11 03:56 · Score: 1

How? Please explain.
I would expect that bypassing the network stack is no small feat.

--
- Zav - Imagine a Beowulf cluster of insensitive clods...