Slashdot Mirror


Researchers Map Locations of 4,669 Servers In Netflix's Content Delivery Network (ieee.org)

Wave723 writes from a report via IEEE Spectrum: For the first time, a team of researchers has mapped the entire content delivery network that brings Netflix to the world, including the number and location of every server that the company uses to distribute its films. They also independently analyzed traffic volumes handled by each of those servers. Their work allows experts to compare Netflix's distribution approach to those of other content-rich companies such as Google, Akamai and Limelight. To do this, IEEE Spectrum reports that the group reverse-engineered Netflix's domain name system for the company's servers, and then created a crawler that used publicly available information to find every possible server name within its network through the common address nflxvideo.net. In doing so, they were able to determine the total number of servers the company uses, where those servers are located, and whether the servers were housed within internet exchange points or with internet service providers, revealing stark differences in Netflix's strategy between countries. One of their most interesting findings was that two Netflix servers appear to be deployed within Verizon's U.S. network, which one researcher speculates could indicate that the companies are pursuing an early pilot or trial.

32 of 57 comments (clear)

  1. And? by NotInHere · · Score: 3, Insightful

    Why is netflix's server architecture so interesting? Or is science just miles behind the industry in this subject and now they are desperately trying to catch up?

    1. Re:And? by Anonymous Coward · · Score: 5, Insightful

      Wait, are you saying that information like the server load distribution of a real world system like Netflix isn't useful to studying such systems and designing future ones? There aren't that many systems like this in the wild, and most of them don't release their information publicly, so getting an extra example could be quite useful. If they didn't have this, many of the people complaining here would then instead be complaining about academics who ignore real world systems in their studies. It is not unlike a lot of research work that is done to characterize various products that the manufacturer doesn't provide enough info for, whether the latest cpu performance in a real world test to benchmarking networking gear or detailing the performance of some new high speed camera, etc.

    2. Re:And? by omnichad · · Score: 1

      Reverse engineering trade secrets is fun?

  2. Interesting ratio by guruevi · · Score: 1

    Netflix has ~4,000 movies and ~1,000 TV shows, any server thus seems to handle just 1 movie and about 1000 connections each (if 5% of their user base is actively streaming at any one point)? This just seems like an awful lot of servers for what I find to be a relatively low load for simply streaming data.

    I'm sure Netflix could save a lot of money and network headaches by using a BitTorrent type approach, it would also alleviate the "problems" with the providers, most traffic would stay within their network.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
    1. Re:Interesting ratio by Anonymous Coward · · Score: 5, Insightful

      The BitTorrent approach is the wrong one for two reasons:
      1. A lot of people have asymmetrical connections with a very slow upload speed
      2. A lot of people have monthly data caps with hefty fees for going over

    2. Re:Interesting ratio by drinkypoo · · Score: 1

      I'm sure Netflix could save a lot of money and network headaches by using a BitTorrent type approach,

      But you're wrong. They use caching servers instead, on the premise that shows are watched in trends. They know more about watching habits than you do, and they are relatively technically competent, so if there were benefit to that they'd probably be doing it already.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    3. Re:Interesting ratio by Bengie · · Score: 3, Interesting

      USA peak bandwidth was about 60Tb/s back in 2012 and average bandwidth growth has been at least 50% year or over and steady since the 80s, giving us 400% growth of bandwidth, or about 300Tb/s. Netflix represents about 1/3rd of peak bandwidth, or 100Tb/s. That gives an average of 21Gb/s per server, which sounds ballpark correct seeing that they've moving to 40Gb and 80Gb/s uplinks on their servers.

      Regardless of how many people are actually watching, 20Gb/s average is pretty cool. Another interesting note is Netflix servers barely benefit from caching data in memory. Each server is handling to many requests per second from so many different customers, almost no customers are at the same point in the same show, and requests from a customers are temporally far away from each other that almost all requests are just random access. It's also interesting to know that Netflix is beyond the 80/20 rule, they're in the 90/10 rule, in that 10% of their data represents 90% of their requests. Predicting which 10% is important, and they can't use normal evict least-used algorithms because that would cause cache-thrashing. They algorithmically predict what will be watched every night, upload the data to be cached and logically "pin it" so it doesn't get evicted.

      Other interesting stuff that they support for syncing the servers is each server can be configured to use a different route to pull down its data and even configure the amount of bandwidth, then the servers within a local can sync with each-other with a kind of P2P setup. This helps load balance routes. Their SSD servers hold quite a bit less storage than mech-drive storage, so the SSDs typically are hit first, but hold only the most requested of data. Last I knew, their SSD servers did not support acting as a cache while loading, because of IO patterns that didn't play well with SSDs with mixed sustained heavy reads and writes. They may have changed or may be changing in the near future. I know the biggest reason for this was the way most SSD firmware supported garbage collections could cause long pauses of no activity with sustained heavy writes. One of the changes was for FreeBSD to have a target latency for reads/writes and throttle the writes until latency came down.

    4. Re:Interesting ratio by Shatrat · · Score: 1

      I think you're missing the point of these appliances. Netflix has plenty of capacity to host all its content in central locations.
      These appliances are installed at the ISP offices so that the content is as close to the subscriber as possible. That way the quality of the video is not dependent on the quality of the long-haul network from the ISP back to Chicago, Dallas, Ashburn, London, Frankfurt or wherever.
      It also reduces the IP transit costs of the ISP, which they are typically paying for based on utilization if they are not a Tier 1 like AT&T. It also reduces their transport costs. If the ISP has to upgrade from two 10gbe pipes to four, that's probably going to hit their bottom line for $10k to $20k a month. If installing some caching appliances helps them delay that upgrade by a year, that's a massive benefit.

      --
      09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
    5. Re:Interesting ratio by Shatrat · · Score: 1

      Here's a bigger reason.
      How do you play bit torrent content in a browser or app?
      How do you control access to the content based on accounts?
      How do you prevent Joe Blow Wireshark Pro from noticing that Senator Blowhard McJesus has been binge watching R rated slasher flicks?
      I'm sure they can all be overcome with some coding and customization, but they've already done that work for their existing solution.

      --
      09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
    6. Re:Interesting ratio by guruevi · · Score: 1

      I've seen the same problem on cheap SSD's like the Samsung "Pro" lines. I've had much better results with the Intel DC line though, they seem to be able to sustain their "average" read and write speeds.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    7. Re:Interesting ratio by Bengie · · Score: 1

      My understanding of the SSD writing issue is SSDs have a sizable amount of reserved space for wear leveling. Assuming this space is large enough, garbage collection can be postponed as long as there is some spare room. Many SSDs only do GC while the drive is "idle" by some definition, which many times is how many milliseconds since the last request. If the drive never becomes idle, the SSD will continue to postpone the GC until it absolutely has to, as which point it stops everything and does GC.

      Insult to injury is that the device waited so long that there is more work meaning it will take even longer, sometimes many seconds. The good news is that SSDs have very reliable performance characteristics, and as the available wear-leveling free-space starts to get consumed by deferred GC work, writes start taking longer because it gets harder to find fresh blocks. This "longer" can be detected because both reads and writes suffer since you can't read from the same part of the drive that is being written to until it has finished.

      Read latency is pretty much constant, so any increase in latency means the firmware is doing work. Detecting this increase and backing off, primarily on writes, gives the GC time to do its job and minimizing GC latency even if at the expense of throughput. FreeBSD is also getting new limiting features to their IO layer that will allow admins to limit read IOPs, write IOPs, read throughput, and write throughput all separately. This can be configured system wide per block device and/or by jail per device.

    8. Re:Interesting ratio by Bengie · · Score: 2

      I wish torrent was better at taking advantage of my fast symmetrical connection. If more users uploaded to me instead of to others, I could upload more to others and it would be overall faster for everyone. Easier said than done, I know. Simple attempts to do this would result in gaming the system or DOSing the system by wasting other's upload bandwidth.

    9. Re:Interesting ratio by allo · · Score: 1

      And yet it works for all uses of bittorrent. Bittorrent was invented to distribute data without overloading a weak server and worked that good, that the filesharing community instantly adopted it.
      Of course, your upload won't be enough for someones download. But the upload of a few people may sponsor your download.
      And the tech is already there, see popcorntime (never used it myself, but a nice idea).

    10. Re: Interesting ratio by allo · · Score: 1

      You may be paying with your bandwidth for a cheaper netflix for you.

  3. Hopefully... by hcs_$reboot · · Score: 1

    these findings will also allow to find a way to bypass Netflix geoblocking. Content is my country is so poor, makes me want to cry :-(

    --
    Slashdot, fix the reply notifications... You won't get away with it...
  4. Re:Gee thanks guys by hcs_$reboot · · Score: 1

    WHy would they do that, the MPAA gets already wagons of $ from Netflix, and also got them to implement strict geoblocking

    --
    Slashdot, fix the reply notifications... You won't get away with it...
  5. On the Faroe islands? by Henriok · · Score: 1

    It seems that Netflix has delivery servers on the Faroe islands north of the UK and south of Iceland. I guss they would need that for its 50.000 residents Since they only have one in China and none i countries like Portugal, Turkey and Israel, that seems reasonable.

    --

    - Henrik

    - when the Shadows descend -
    1. Re:On the Faroe islands? by Reaperducer · · Score: 1

      Probably a strategic location sitting on a trans-Atlantic cable. Though, you could also get that in Portugal.

      --
      -- I'm old enough to have lived through six different meanings of the word "hacker."
    2. Re:On the Faroe islands? by drinkypoo · · Score: 3

      Netflix will send a CDN server more or less to any ISP which requests one, and is willing to pay the power bill. Do you not remember when many ISPs were loudly refusing to install these free machines even though they would save them money because they objected to "free" colocation on principle?

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    3. Re:On the Faroe islands? by Shatrat · · Score: 1

      The ones complaining loudly mostly objected to competition with their own video product. ISPs that aren't also cable companies or have some other ulterior motive love these caching devices because they dramatically reduce their transport/transit costs and increase customer satisfaction.
      The colocation objection was smoke and mirrors crap. These things take up less space than some T1 muxes.

      --
      09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
  6. Re:Uh by Reaperducer · · Score: 1

    If you could find out how many subscribers it has in each country, it might not be odd at all.

    Also, you have to factor in things like the potential for natural disaster (Japan) and the gub'mint horking your servers in a political/ransom/whoknows move (Russia). Sweden's a good, stable location from which to serve content across the top of the world with little worry.

    --
    -- I'm old enough to have lived through six different meanings of the word "hacker."
  7. You mean to tell me by Snotnose · · Score: 1

    Netflix isn't run on a server in Reed Hasting's bathroom? I'm shocked, shocked I tell you.

  8. Isn't it common knowledge? by acoustix · · Score: 1

    Isn't it common knowledge that Netflix will provider servers/appliances to ISP's who request them in order to cut down on video traffic during peak hours? Why is the fact that Verizon has a few such a big deal?

    This program is well known: Open Connect

    --
    "A plan fiendishly clever in its intricacies"- Homer Simpson
    1. Re:Isn't it common knowledge? by Bengie · · Score: 1

      The big deal is that Verizon has a history of playing games with Netflix data and trying to charge them exorbitant transit prices for local peering data. It's quite telling how horrible Verizon has been about Netflix that they only have 2 devices in their network. With how large Verizon is, I assume Netflix would love to load test their 80Gb/s boxes in a FIOS region.

    2. Re:Isn't it common knowledge? by Coren22 · · Score: 1

      Even at the height of those games, my 75/75Mbit FiOS never had issues streaming Netflix. I think it was an entirely fabricated issue. I have not however ever streamed 4k content...one of these days I will have to flip open the laptop screen and watch a 4k show to see what the big deal is.

      --
      APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
    3. Re:Isn't it common knowledge? by Bengie · · Score: 1

      Streaming issues were regional and consistent within the region. Your are may not have ever been affected, but other areas were and had the issue for months or years at a time.

  9. Heatmap by Citizen+of+Earth · · Score: 1, Insightful

    ObXKCD: Heatmap

  10. The original paper on Arxiv by laughingskeptic · · Score: 1

    https://arxiv.org/pdf/1606.05519v1.pdf

    Related slides:

    http://eecs.qmul.ac.uk/~boettget/mapping-netflix-coseners16.pdf

  11. Research by allo · · Score: 1

    Every real scientist is embarrassed, when somebody claims that reversing secret data from companies is research. There is NOTHING researched there, everything was known, just not to you. Research is about discovering something new, not about trying to get somebody else' secrets.

    Use your time and money for something useful. Stop tracing netflix servers, start inventing something on your own.

    1. Re:Research by allo · · Score: 1

      Indeed, its not only about inventions, but analysis of things and discovering how stuff works.

      But discovering how stuff works means like what are the parts of a atom. Not dissecting things, which are built before and you could just ask the builder. Else there is endless science. One team builds stuff and keeps it secret, the other team dissects it.

      Keep this to the businesses. Let amazon prime video dissect netflix and keep it to science to find the best way to do the stuff, instead of just finding out the netflix way. If the netflix way IS the best way, you will find the same one.

    2. Re: Research by allo · · Score: 1

      Scientists AND the people, who keep stuff hidden while they know its researched at the same time.

  12. Missing some details by darkain · · Score: 1

    They're missing a few details in their analysis. What about DNS load-balancing returning multiple potential IP addresses per name? What about anycasting IP addresses, or multiple end-points for an IP address depending on entry patch into a location? I think what they really found was a total number of potential current DNS names, but I somehow doubt that is the entirety of the CDN deployed right now. Also, because Netflix is very well known to be static content with controlled client applications, there is also the possibility of transparent proxies anywhere along the chain with Netflix partnering with ISPs or peer exchanges to internally reroute the traffic.

    Good start thought!