Slashdot Mirror


Researchers Map Locations of 4,669 Servers In Netflix's Content Delivery Network (ieee.org)

Wave723 writes from a report via IEEE Spectrum: For the first time, a team of researchers has mapped the entire content delivery network that brings Netflix to the world, including the number and location of every server that the company uses to distribute its films. They also independently analyzed traffic volumes handled by each of those servers. Their work allows experts to compare Netflix's distribution approach to those of other content-rich companies such as Google, Akamai and Limelight. To do this, IEEE Spectrum reports that the group reverse-engineered Netflix's domain name system for the company's servers, and then created a crawler that used publicly available information to find every possible server name within its network through the common address nflxvideo.net. In doing so, they were able to determine the total number of servers the company uses, where those servers are located, and whether the servers were housed within internet exchange points or with internet service providers, revealing stark differences in Netflix's strategy between countries. One of their most interesting findings was that two Netflix servers appear to be deployed within Verizon's U.S. network, which one researcher speculates could indicate that the companies are pursuing an early pilot or trial.

57 comments

  1. And? by NotInHere · · Score: 3, Insightful

    Why is netflix's server architecture so interesting? Or is science just miles behind the industry in this subject and now they are desperately trying to catch up?

    1. Re:And? by Anonymous Coward · · Score: 5, Insightful

      Wait, are you saying that information like the server load distribution of a real world system like Netflix isn't useful to studying such systems and designing future ones? There aren't that many systems like this in the wild, and most of them don't release their information publicly, so getting an extra example could be quite useful. If they didn't have this, many of the people complaining here would then instead be complaining about academics who ignore real world systems in their studies. It is not unlike a lot of research work that is done to characterize various products that the manufacturer doesn't provide enough info for, whether the latest cpu performance in a real world test to benchmarking networking gear or detailing the performance of some new high speed camera, etc.

    2. Re:And? by Anonymous Coward · · Score: 0

      It is probably more interesting to those who don't know anything about how the internet works. Most people don't even know anything about Akamai servers yet they are in pretty much every ISP NOC and handle a large portion of internet content.

    3. Re:And? by Anonymous Coward · · Score: 0

      Because some people find it interesting. Move along now.

    4. Re:And? by omnichad · · Score: 1

      Reverse engineering trade secrets is fun?

    5. Re:And? by Anonymous Coward · · Score: 0

      Or is science just miles behind the industry in this subject and now they are desperately trying to catch up?

      Never heard of industrial case studies before?

    6. Re:And? by Anonymous Coward · · Score: 0

      Reverse engineering trade secrets is fun!

      Fixed that for you.

    7. Re:And? by Anonymous Coward · · Score: 0

      Or is science just miles behind the industry in this subject and now they are desperately trying to catch up?

      Not so much behind, but it does have important holes in the data set. Public researchers aren't can't just build some on the scale of Netflix play around with it to get data and experience on how well it works. It is one thing to build a system that can be simulated in a lab, or work on a data distribution for projects like the LHC, but those involve very different scales and use cases.

      Complaining about this is like complaining about people who measure traffic patterns on highways. Traffic engineering spends plenty of time modeling traffic on computers and will even spend time with a dozen or so cars testing things on a test track. But things need to be confirmed with real systems at full scale.

  2. Interesting ratio by guruevi · · Score: 1

    Netflix has ~4,000 movies and ~1,000 TV shows, any server thus seems to handle just 1 movie and about 1000 connections each (if 5% of their user base is actively streaming at any one point)? This just seems like an awful lot of servers for what I find to be a relatively low load for simply streaming data.

    I'm sure Netflix could save a lot of money and network headaches by using a BitTorrent type approach, it would also alleviate the "problems" with the providers, most traffic would stay within their network.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
    1. Re:Interesting ratio by Anonymous Coward · · Score: 5, Insightful

      The BitTorrent approach is the wrong one for two reasons:
      1. A lot of people have asymmetrical connections with a very slow upload speed
      2. A lot of people have monthly data caps with hefty fees for going over

    2. Re:Interesting ratio by guruevi · · Score: 0

      Perhaps people would notice and demand at least 3rd world country level Internet from their shitty providers then.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    3. Re:Interesting ratio by Anonymous Coward · · Score: 0

      Bittorent is optimized pretty well for that. At the peak of it's popularity bittorrent accounted for more than a third of all traffic. Netflix is only just catching up to that now, and largely by 'converting' previous bittorrent users. If they switched to a bittorent backend there would be more than enough throughput.

    4. Re:Interesting ratio by drinkypoo · · Score: 1

      I'm sure Netflix could save a lot of money and network headaches by using a BitTorrent type approach,

      But you're wrong. They use caching servers instead, on the premise that shows are watched in trends. They know more about watching habits than you do, and they are relatively technically competent, so if there were benefit to that they'd probably be doing it already.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    5. Re:Interesting ratio by Bengie · · Score: 3, Interesting

      USA peak bandwidth was about 60Tb/s back in 2012 and average bandwidth growth has been at least 50% year or over and steady since the 80s, giving us 400% growth of bandwidth, or about 300Tb/s. Netflix represents about 1/3rd of peak bandwidth, or 100Tb/s. That gives an average of 21Gb/s per server, which sounds ballpark correct seeing that they've moving to 40Gb and 80Gb/s uplinks on their servers.

      Regardless of how many people are actually watching, 20Gb/s average is pretty cool. Another interesting note is Netflix servers barely benefit from caching data in memory. Each server is handling to many requests per second from so many different customers, almost no customers are at the same point in the same show, and requests from a customers are temporally far away from each other that almost all requests are just random access. It's also interesting to know that Netflix is beyond the 80/20 rule, they're in the 90/10 rule, in that 10% of their data represents 90% of their requests. Predicting which 10% is important, and they can't use normal evict least-used algorithms because that would cause cache-thrashing. They algorithmically predict what will be watched every night, upload the data to be cached and logically "pin it" so it doesn't get evicted.

      Other interesting stuff that they support for syncing the servers is each server can be configured to use a different route to pull down its data and even configure the amount of bandwidth, then the servers within a local can sync with each-other with a kind of P2P setup. This helps load balance routes. Their SSD servers hold quite a bit less storage than mech-drive storage, so the SSDs typically are hit first, but hold only the most requested of data. Last I knew, their SSD servers did not support acting as a cache while loading, because of IO patterns that didn't play well with SSDs with mixed sustained heavy reads and writes. They may have changed or may be changing in the near future. I know the biggest reason for this was the way most SSD firmware supported garbage collections could cause long pauses of no activity with sustained heavy writes. One of the changes was for FreeBSD to have a target latency for reads/writes and throttle the writes until latency came down.

    6. Re: Interesting ratio by Anonymous Coward · · Score: 0

      Im not a communist. You didnt pay for my bandwidth. Fuck torrent.

    7. Re:Interesting ratio by Anonymous Coward · · Score: 0

      A seedbox handles these problems quite nicely you can even get one that can transcode and stream downloaded torrents if you like. (like plex and such)

    8. Re:Interesting ratio by Anonymous Coward · · Score: 0

      People have noticed and that hasn't happened yet...

    9. Re:Interesting ratio by Shatrat · · Score: 1

      I think you're missing the point of these appliances. Netflix has plenty of capacity to host all its content in central locations.
      These appliances are installed at the ISP offices so that the content is as close to the subscriber as possible. That way the quality of the video is not dependent on the quality of the long-haul network from the ISP back to Chicago, Dallas, Ashburn, London, Frankfurt or wherever.
      It also reduces the IP transit costs of the ISP, which they are typically paying for based on utilization if they are not a Tier 1 like AT&T. It also reduces their transport costs. If the ISP has to upgrade from two 10gbe pipes to four, that's probably going to hit their bottom line for $10k to $20k a month. If installing some caching appliances helps them delay that upgrade by a year, that's a massive benefit.

      --
      09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
    10. Re:Interesting ratio by Shatrat · · Score: 1

      Here's a bigger reason.
      How do you play bit torrent content in a browser or app?
      How do you control access to the content based on accounts?
      How do you prevent Joe Blow Wireshark Pro from noticing that Senator Blowhard McJesus has been binge watching R rated slasher flicks?
      I'm sure they can all be overcome with some coding and customization, but they've already done that work for their existing solution.

      --
      09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
    11. Re:Interesting ratio by guruevi · · Score: 1

      I've seen the same problem on cheap SSD's like the Samsung "Pro" lines. I've had much better results with the Intel DC line though, they seem to be able to sustain their "average" read and write speeds.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    12. Re:Interesting ratio by Bengie · · Score: 1

      My understanding of the SSD writing issue is SSDs have a sizable amount of reserved space for wear leveling. Assuming this space is large enough, garbage collection can be postponed as long as there is some spare room. Many SSDs only do GC while the drive is "idle" by some definition, which many times is how many milliseconds since the last request. If the drive never becomes idle, the SSD will continue to postpone the GC until it absolutely has to, as which point it stops everything and does GC.

      Insult to injury is that the device waited so long that there is more work meaning it will take even longer, sometimes many seconds. The good news is that SSDs have very reliable performance characteristics, and as the available wear-leveling free-space starts to get consumed by deferred GC work, writes start taking longer because it gets harder to find fresh blocks. This "longer" can be detected because both reads and writes suffer since you can't read from the same part of the drive that is being written to until it has finished.

      Read latency is pretty much constant, so any increase in latency means the firmware is doing work. Detecting this increase and backing off, primarily on writes, gives the GC time to do its job and minimizing GC latency even if at the expense of throughput. FreeBSD is also getting new limiting features to their IO layer that will allow admins to limit read IOPs, write IOPs, read throughput, and write throughput all separately. This can be configured system wide per block device and/or by jail per device.

    13. Re:Interesting ratio by Bengie · · Score: 2

      I wish torrent was better at taking advantage of my fast symmetrical connection. If more users uploaded to me instead of to others, I could upload more to others and it would be overall faster for everyone. Easier said than done, I know. Simple attempts to do this would result in gaming the system or DOSing the system by wasting other's upload bandwidth.

    14. Re:Interesting ratio by allo · · Score: 1

      And yet it works for all uses of bittorrent. Bittorrent was invented to distribute data without overloading a weak server and worked that good, that the filesharing community instantly adopted it.
      Of course, your upload won't be enough for someones download. But the upload of a few people may sponsor your download.
      And the tech is already there, see popcorntime (never used it myself, but a nice idea).

    15. Re: Interesting ratio by allo · · Score: 1

      You may be paying with your bandwidth for a cheaper netflix for you.

  3. Hopefully... by hcs_$reboot · · Score: 1

    these findings will also allow to find a way to bypass Netflix geoblocking. Content is my country is so poor, makes me want to cry :-(

    --
    Slashdot, fix the reply notifications... You won't get away with it...
  4. Re:Gee thanks guys by hcs_$reboot · · Score: 1

    WHy would they do that, the MPAA gets already wagons of $ from Netflix, and also got them to implement strict geoblocking

    --
    Slashdot, fix the reply notifications... You won't get away with it...
  5. Re:Gee thanks guys by geekboybt · · Score: 0

    whoosh

  6. On the Faroe islands? by Henriok · · Score: 1

    It seems that Netflix has delivery servers on the Faroe islands north of the UK and south of Iceland. I guss they would need that for its 50.000 residents Since they only have one in China and none i countries like Portugal, Turkey and Israel, that seems reasonable.

    --

    - Henrik

    - when the Shadows descend -
    1. Re:On the Faroe islands? by Reaperducer · · Score: 1

      Probably a strategic location sitting on a trans-Atlantic cable. Though, you could also get that in Portugal.

      --
      -- I'm old enough to have lived through six different meanings of the word "hacker."
    2. Re:On the Faroe islands? by drinkypoo · · Score: 3

      Netflix will send a CDN server more or less to any ISP which requests one, and is willing to pay the power bill. Do you not remember when many ISPs were loudly refusing to install these free machines even though they would save them money because they objected to "free" colocation on principle?

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    3. Re:On the Faroe islands? by Shatrat · · Score: 1

      The ones complaining loudly mostly objected to competition with their own video product. ISPs that aren't also cable companies or have some other ulterior motive love these caching devices because they dramatically reduce their transport/transit costs and increase customer satisfaction.
      The colocation objection was smoke and mirrors crap. These things take up less space than some T1 muxes.

      --
      09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
  7. Uh by Anonymous Coward · · Score: 0

    5 in Sweden, 1 in Japan, and none in Russia? This seems odd.

    1. Re:Uh by Reaperducer · · Score: 1

      If you could find out how many subscribers it has in each country, it might not be odd at all.

      Also, you have to factor in things like the potential for natural disaster (Japan) and the gub'mint horking your servers in a political/ransom/whoknows move (Russia). Sweden's a good, stable location from which to serve content across the top of the world with little worry.

      --
      -- I'm old enough to have lived through six different meanings of the word "hacker."
    2. Re:Uh by Anonymous Coward · · Score: 0

      And take a look at where major internet backbones run...there is a shocking correlation

  8. You mean to tell me by Snotnose · · Score: 1

    Netflix isn't run on a server in Reed Hasting's bathroom? I'm shocked, shocked I tell you.

  9. the purpose of maping of CDNs by Anonymous Coward · · Score: 0

    Why is netflix's server architecture so interesting? Or is science just miles behind the industry in this subject and now they are desperately trying to catch up?

    This is part of the EU's ENDEAVOUR Project:

    The focus of the project is to enable added-value services to be provided thanks to Software-Defined Networking (SDN), on top of Internet Exchange Points and other network interconnnection fabrics. The services would relate not only to the flexibility of the interconnection fabric, but most importantly to enable the content and data center ecosystem that is present at the interconnection fabric to collaborate. The ultimate goal is to create a service marketplace on top of the ecosystem composed of Cloud/data centers, networked applications, and the interconnection fabric.

    * https://www.h2020-endeavour.eu/objectives

    So if you're doing SDN you want to be able to analyze flows and then perhaps develop algorithms to make more more optimal routing decisions.

    Doing a mapping at first manually may be necessary, because after which you can perhaps you can feed the data into machine learning system to see if you can get your SDN controller to automatically get the same result. This way the next CDN that comes along can be detected and (say) an ISP's network can change flows to better move the bits around.

    But I'm just spit-balling off the top of my head here.

  10. Isn't it common knowledge? by acoustix · · Score: 1

    Isn't it common knowledge that Netflix will provider servers/appliances to ISP's who request them in order to cut down on video traffic during peak hours? Why is the fact that Verizon has a few such a big deal?

    This program is well known: Open Connect

    --
    "A plan fiendishly clever in its intricacies"- Homer Simpson
    1. Re:Isn't it common knowledge? by Bengie · · Score: 1

      The big deal is that Verizon has a history of playing games with Netflix data and trying to charge them exorbitant transit prices for local peering data. It's quite telling how horrible Verizon has been about Netflix that they only have 2 devices in their network. With how large Verizon is, I assume Netflix would love to load test their 80Gb/s boxes in a FIOS region.

    2. Re:Isn't it common knowledge? by Coren22 · · Score: 1

      Even at the height of those games, my 75/75Mbit FiOS never had issues streaming Netflix. I think it was an entirely fabricated issue. I have not however ever streamed 4k content...one of these days I will have to flip open the laptop screen and watch a 4k show to see what the big deal is.

      --
      APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
    3. Re:Isn't it common knowledge? by Bengie · · Score: 1

      Streaming issues were regional and consistent within the region. Your are may not have ever been affected, but other areas were and had the issue for months or years at a time.

  11. Heatmap by Citizen+of+Earth · · Score: 1, Insightful

    ObXKCD: Heatmap

  12. You forgot Alaska by Anonymous Coward · · Score: 0

    We have a relatively large Netflix CDN in Anchorage....but apparently Alaska isn't part of the North American Continent according to this "team of researchers" :)

  13. What about us DVD by mail subscribers? by Anonymous Coward · · Score: 0

    They should include the motor fleet and hard working men and women of the USPS

  14. The original paper on Arxiv by laughingskeptic · · Score: 1

    https://arxiv.org/pdf/1606.05519v1.pdf

    Related slides:

    http://eecs.qmul.ac.uk/~boettget/mapping-netflix-coseners16.pdf

  15. Research by allo · · Score: 1

    Every real scientist is embarrassed, when somebody claims that reversing secret data from companies is research. There is NOTHING researched there, everything was known, just not to you. Research is about discovering something new, not about trying to get somebody else' secrets.

    Use your time and money for something useful. Stop tracing netflix servers, start inventing something on your own.

    1. Re:Research by Anonymous Coward · · Score: 0

      Research is not just about discovering something new, but disseminating the information in ways that allow future studies to leverage them. Having data on the quantitative differences between different CDNs (netflix, akamai, whatever) enables them to identify trends, differences, and the effects of both.

    2. Re:Research by allo · · Score: 1

      Indeed, its not only about inventions, but analysis of things and discovering how stuff works.

      But discovering how stuff works means like what are the parts of a atom. Not dissecting things, which are built before and you could just ask the builder. Else there is endless science. One team builds stuff and keeps it secret, the other team dissects it.

      Keep this to the businesses. Let amazon prime video dissect netflix and keep it to science to find the best way to do the stuff, instead of just finding out the netflix way. If the netflix way IS the best way, you will find the same one.

    3. Re: Research by Anonymous Coward · · Score: 0

      This happens all the time with real science. Some of the most useful papers for my work are the ones measuring the specific qualities and geometric construction of commercial, high speed light sensors. The company doesn't want to give less vague data sheets because they view it is binding, and won't share the geometry details when asked because they say it is proprietary. So there are papers doing detailed measurements and analysis of the sensors, and confirming real world performance. This gets used by teams that don't have time and money to build their own sensors, while i know those that do such development work use the same papers to learn from and improve upon designs.

      Science doesn't distquish between studying phenomena that is unknown to no one vs unknown to all but a company's team. You idea that it allows endless busy work is BS and stupid. If someone wanted to study something completely useless, they don't need a team to build something in secret, and either way wouldn't get funding. However, if you wanted to study a proprietary product that would allow you to publish useful information that increases productivity for researchers using or redesigning that product, that is useful and encouraged work.

      You act like scientists should be ashamed of such things. I am not, because usually such research is critical to actually getting stuff done and often goes far too thankless.

    4. Re: Research by allo · · Score: 1

      Scientists AND the people, who keep stuff hidden while they know its researched at the same time.

    5. Re:Research by Anonymous Coward · · Score: 0

      Every real scientist is embarrassed, when somebody claims that reversing secret data from companies is research

      I'm not, so no, not every scientist. And I am pretty sure neither are my colleagues on my team that depend upon work like this. And at least one other poster can be added to that list. Are you even a researcher yourself, or are you just trying to speak for a broad category of people, potentially without even knowing what their actual work entails?

      keep it to science to find the best way to do the stuff

      A huge part of finding better ways to do things is to know what has already been done and how well that has worked out. This is easily what separates most internet armchair scientists from ones who actually do science for a living. It is trivial to throw out new ideas, but it takes a lot of time and work to figure out what has been done before and what needs to be done to move things forward.

  16. Missing some details by darkain · · Score: 1

    They're missing a few details in their analysis. What about DNS load-balancing returning multiple potential IP addresses per name? What about anycasting IP addresses, or multiple end-points for an IP address depending on entry patch into a location? I think what they really found was a total number of potential current DNS names, but I somehow doubt that is the entirety of the CDN deployed right now. Also, because Netflix is very well known to be static content with controlled client applications, there is also the possibility of transparent proxies anywhere along the chain with Netflix partnering with ISPs or peer exchanges to internally reroute the traffic.

    Good start thought!