Slashdot Mirror


AWS Load Balancer Sends 2 Million Netflix API Reqs To Wrong Customer

rsk writes "Amazon Web Services' Elastic Load Balancer is a dynamic load-balancer managed by Amazon. Load balancers regularly swapped around with each other which can lead to surprising results; like getting millions of requests meant for a different AWS customer. Using ELBs can result in AWS unintentionally introducing a man-in-the-middle (attack) into your application environment. Most AWS users do not realize this can happen and have not secured against it."

58 comments

  1. TTL value by SharkLaser · · Score: 2

    It looks more like some client aren't respecting the DNS TTL value, so technically it's not Amazon's fault. You should stick to standards, and if TTL says it's 60 seconds, then it is.

    1. Re:TTL value by Florian+Weimer · · Score: 2

      Browsers are sometimes forced to disregard TTL values to prevent certain type of attacks which involve quickly changing DNS records.

    2. Re:TTL value by Anonymous Coward · · Score: 0

      Amazon knows better than to trust Microsoft to honor the TTL. It's been broken for more than a decade and a half. Even Netware gets the minimum wrong even though it doesn't default to nearly as bad a value as Microsoft. Also, some ISPs like AOL will not honor TTLs below a certain value. 1800 seconds is the absolute minimum of a TTL you should use if you want to only have Microsoft and AOL screw you over a little.

    3. Re:TTL value by girlintraining · · Score: 2

      It looks more like some client aren't respecting the DNS TTL value, so technically it's not Amazon's fault.

      "Technically", no. But two people pointing a finger at each other and saying "He did it!" doesn't solve anything, and all the customer gets is the finger.

      --
      #fuckbeta #iamslashdot #dicemustdie
    4. Re:TTL value by JWSmythe · · Score: 4, Interesting

          From what I've seen, it's frequently the client's DNS servers, not the client itself.

          I've used a short TTL (5m) for quite a while. It's intentional, because I've needed to switch things rather quickly in the past, and it's better for it to "just work", rather than waiting hours for everyone to pick up the change.

          I used to work for a place that had a huge traffic load. Our slow days were still millions of unique visitors. When we took a machine out of DNS (DNS round robin between 15+ machines), we'd see the traffic drop significantly in the first 5 minutes. When AOL finally saw our change, it would drop more. There would still be lingering people for about an hour, and then it would finally be idle.

          That was a pretty regular thing for us to do. We scaled our traffic to our various datacenters this way. We'd also load test lines and individual servers with it. If it looked like we were running into a bandwidth limitation, I'd throw a few hundred Mb/s down the line, and see how it performed. If it really was, we'd then switch everything away from it to other datacenters until the provider fixed it.

          In all those circumstances, in 5 minutes most (but not all) of the traffic moved. An hour from the change, the remainder had moved.

          I've seen this with my home provider. I let them handle DNS for my home machine, rather than doing it myself. I've made changes, and they don't respect it within 30 minutes. Within about an hour, the new DNS records show properly.

          Google's public DNS servers seem to do pretty well in that respect. Our changes are reflected properly there in just a few minutes. AOL, TimeWarner/RoadRunner, and a few others are pretty bad. I know why they do it (reducing load on their DNS servers), but it becomes a pain in the ass for places that need to make changes quickly.
         

      --
      Serious? Seriousness is well above my pay grade.
    5. Re:TTL value by arglebargle_xiv · · Score: 1

      It looks more like some client aren't respecting the DNS TTL value, so technically it's not Amazon's fault.

      "Technically", no. But two people pointing a finger at each other and saying "He did it!" doesn't solve anything, and all the customer gets is the finger.

      Thus Elastic Load Balancer's other name, Erratic Load Balancer.

    6. Re:TTL value by hedwards · · Score: 2

      If the customer's getting the finger, wouldn't that make it more of an Erotic Load Balancer?

    7. Re:TTL value by Kattare · · Score: 2

      Problem with any of these scenarios is that according to the AWS forum post, he's been getting rogue Netflix traffic for 4 days. No dns server or mainstream client is going to keep a 60 second TTL record for 4 days. It's either an issue at AWS completely unrelated to DNS, or an issue in Netflix clients. With it being in TV's, BluRay players, Xboxes, IOS, Wii's, etc... who knows what client the issue could be in... I wonder if the forum poster could capture the browser string and help debug?

    8. Re:TTL value by mysidia · · Score: 1

      Browsers are sometimes forced to disregard TTL values to prevent certain type of attacks which involve quickly changing DNS records.

      No, they are not "forced" to do so. They have chosen an improper method to "workaround" a security issue that violates other internet standards and causes issues, because they are not implementing DNS resolution in a valid way.

      The TTL in DNS is not an "advisory" value, it is a time after which the old RR in the previous authoritative DNS response must be expunged, a TTL of 0 prohibits caching altogether.

      There are other methods of preventing attacks that involve quickly changing DNS records, like, oh, fixing their trust policy, or throwing up an error requiring the user to reload their page.

    9. Re:TTL value by mysidia · · Score: 1

      Pointing the figure and screaming very loudly would be very good, especially if Amazon does it, as it will help bring attention to broken behavior in DNS and browser software.

      I will agree it hurts Amazon, but it helps the community, for large players like Amazon to help bring attention to broken software, so that it can be fixed.

    10. Re:TTL value by Anonymous Coward · · Score: 0

      Excuse me, but what browser implements it's own DNS resolver?

    11. Re:TTL value by Anonymous Coward · · Score: 0

      > No dns server or mainstream client is going to keep a 60 second TTL

      Sounds like you've never dealt with Microsoft garbage. You're lucky. We've found that the Microsoft Windows NT crappy DNS server has cached 5m TTL's for more than a month. Microsoft doesn't give a damn about the Internet or Internet standards. They will cache a 5m TTL for months.

    12. Re:TTL value by Narcocide · · Score: 1

      They don't need their own resolver to cause problems. Many popular programs cache DNS requests well longer than is appropriate. Firefox, for one caches DNS records internally (some versions on some platforms even for HOURS beyond the TTL unless you restart it) and so does Mac OS X itself.

    13. Re:TTL value by Anonymous Coward · · Score: 0

      "Browsers are sometimes forced to disregard TTL values"

      Browser don't disregard TTLs, they never see them. Gethostbyname only returns an IP. The browsers then implement their own caches (because most system APIs are dumb, and don't maintain a cache) using a default TTL. Squid, and various other software does the same thing. It's a problem that should be well known to anyone dealing with DNS based load balancing, but the vendors tend to stay mum on it. Otherwise they'd greatly reduce the deployment scenarios for DNS load balancing.

    14. Re:TTL value by Florian+Weimer · · Score: 1

      Browser don't disregard TTLs, they never see them.

      Good point. There are APIs that provide TTL information (such as res_query), but Firefox does not seem to use them. Interesting.

    15. Re:TTL value by DarkOx · · Score: 1

      The browser makers playing fast a lose with standards, outside of html sucks! They all suck, try an find a browser that does PASV ftp *correctly*. They all either as part of a very misguided security attempt or based on the assumption FTP servers are behind NAT and can't be configured to send a correct address in the PASV response don't use the address value returned and stupidly use control sockets remote address as the address.

      That breaks all but the very most common use case and all the browsers do it. I have seen other examples of DNS FAIL out there as well.

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
    16. Re:TTL value by bsane · · Score: 1

      There are some clients that cache dns records until they're restarted. I've removed internet facing vips from dns and weeks later there are still 100+ clients making connections, the only thing that would stop them is a client restart.

    17. Re:TTL value by Anonymous Coward · · Score: 0

      Your servers are configured wrong. They sure do respect TTLs. I run over 100 of them.

    18. Re:TTL value by Anonymous Coward · · Score: 0

      And i have seen ISPs ignoring TTLs and setting their own for 24hours, leaving sites in the top 10 most visites(where I live) dead because we switched datacenters one evening. A call to them didn't help much as they were unwilling to flush those records. Although the next morning it was fixed so I assume they had some calls or their own people noticed something.

    19. Re:TTL value by Ben+Hutchings · · Score: 1

      If browsers don't impose such a minimum, devices with embedded web servers (think printers and home routers) become vulnerable to Cross-Site Request Forgery. They can potentially defend against this by checking the Host header on requests, but since these devices are only manageable through the web there's no good way to establish what the correct value is.

    20. Re:TTL value by JWSmythe · · Score: 1

      I think that was the primary motivation for Google setting up their public DNS servers (8.8.8.8, 8.8.4.4).

      http://code.google.com/speed/public-dns/

      --
      Serious? Seriousness is well above my pay grade.
  2. This hasn't been fixed yet because... by Anonymous Coward · · Score: 0

    Amazon still charges for the bad requests. They have no incentive to fix it.

  3. Why no proxy? by Florian+Weimer · · Score: 1

    Why doesn't Amazon use a reverse proxy which performs additional checks and routes the requests to the right customer? (With Server Name Indication, that would work for TLS, too.) Without that, it's simply not possible to switch IP addresses quickly between non-cooperating targets.

    1. Re:Why no proxy? by SharkLaser · · Score: 1

      Because Elastic Load Balancer isn't just for HTTP traffic, you can use it with any kind of traffic.

    2. Re:Why no proxy? by TooMuchToDo · · Score: 1, Insightful

      On top of that, their "Elastic Load Balancer" (just another bullshit "cloud" marketing term for their cluster of F5 load balancers at each availability zone) is just, as I mentioned, an array of F5 load balancers. They either a) don't support the functionality OP is speaking about, or, more likely, Amazon chooses not to support handling traffic in that way to simply operations.

    3. Re:Why no proxy? by Anonymous Coward · · Score: 0

      Why doesn't Amazon use a reverse proxy which performs additional checks and routes the requests to the right customer? (With Server Name Indication, that would work for TLS, too.) Without that, it's simply not possible to switch IP addresses quickly between non-cooperating targets.

      Better question, why not setup your own reverse proxy cluster on EC2 and have the ELB route traffic to your RPs. Then do all the WAF/URL rewriting/caching/etc you want before sending it to your app servers.

    4. Re:Why no proxy? by Florian+Weimer · · Score: 1

      Does this really help if ELB misdirects requests? Or would this setup result in stable ingress IP addreses, so that ELB worked perfectly?

    5. Re:Why no proxy? by cript2000 · · Score: 2

      F5 supports that functionality. EC2 is not built on any commercial LB vendor.

    6. Re:Why no proxy? by user32.ExitWindowsEx · · Score: 1

      Simple. You most likely still pay for misdirected traffic in that case.

      --
      "Evil will always triumph because good is dumb." -- Dark Helmet
    7. Re:Why no proxy? by Anonymous Coward · · Score: 0

      Simple. You most likely still pay for misdirected traffic in that case.

      Yeah, but there's a lot of crap traffic out there anyhow...nothing can be done about that really. At least you can head it off at the pass before it gets back to your main operations though.

    8. Re:Why no proxy? by Anonymous Coward · · Score: 0

      You're pretty good at being wrong

  4. Charge both ways! by abirdman · · Score: 1

    1. Write a load balancer
    2. Sell it to customers until it breaks
    3. Patent software anomaly
    ...
    Profit!

    --
    Everything I've ever learned the hard way was based on a statistically invalid sample.
    1. Re:Charge both ways! by TooMuchToDo · · Score: 1, Informative

      Actually, they didn't write the load balancer. They just bought F5s and integrated them with their infrastructure to change their configurations programmatically.

    2. Re:Charge both ways! by Anonymous Coward · · Score: 0

      You are 100% wrong. ELB is not built on a commercial LB platform.

      Next you'll be claiming that EC2 is "using Amazon's spare server capacity."

    3. Re:Charge both ways! by TooMuchToDo · · Score: 1

      Which exactly is what they do, using Xen instances. Duh. RedHat built out their environment for them. This is not rocket science, and is all out on the web if you know how to Google, use LinkedIn, etc.

    4. Re:Charge both ways! by Kalriath · · Score: 1

      If they were F5s, they'd actually work. We use F5 here, and from looking at the config, Amazon would have to be literally incompetent to get such basic functionality wrong.

      --
      For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
    5. Re:Charge both ways! by Kalriath · · Score: 1

      Just googled it - if Amazon were using F5, F5 don't know about it. And even if the original design was just using spare capacity, that simply is not the case now (after all, that would imply that if Amazon itself needed to ramp up demand it could - and would - simply annex the entire EC2 capacity to cover it. This is, obviously, not the case).

      --
      For a site about things like basic rights, Slashdot users sure do like to censor "dissent".
    6. Re:Charge both ways! by TooMuchToDo · · Score: 1

      They could've migrated away from them as part of their platform. My knowledge about it is 18-24 months old.

  5. Easy fix below by TSHTF · · Score: 1

    Use rewrite rules to do a 301 redirect to goatse.cx when the host is api.netflix.com!

    1. Re:Easy fix below by mysidia · · Score: 1

      Use rewrite rules to do a 301 redirect to goatse.cx when the host is api.netflix.com!

      Why do that when the person to erronously receive the traffic could maybe do something profitable with that? Such as co-opt the Netflix API calls and display "video" or "messages" to convince the user to subscribe to a different service, netting $$ to the unintended target who received Netflix's requests

    2. Re:Easy fix below by Anonymous Coward · · Score: 0

      Because the goatse.cx redirect is fast to code up and works for all misdirects.
      The netflix api co-opting is going to take more time than it will for all the misdirect clients to eventually pick up the new DNS entries and, obviously, only works with netflix misdirects not all misdirects.

    3. Re:Easy fix below by Anonymous Coward · · Score: 0

      Well, it would certainly help a joke that has been continuously funny for over a decade keep going!

    4. Re:Easy fix below by Anonymous Coward · · Score: 0

      You'd have to respond with a response the client was expecting and will actually interpret correctly, which is probably wrapped in an XML envelope.

      For example:

      <video>
            <pic>goatse.cx/some.jpg</pic>
            <url>goatse.cx/</url>
      </video>

  6. IPv6... by Junta · · Score: 1

    In this scenario, IPv6 would alleviate the need to so aggressively reuse IP addresses in that scenario.

    Of course, one wonders given the high amount of traffic if amazon is needlessly changing addresses. They probably should make more effort to have a tendency to be more persistent even beyond the 'promise' of the ttl. Sort of how in most DHCP servers, even when your lease expires you'll still often get the last address you had because the DHCP server retained it anyway unless pool exhaustion forces a change.

    It seems every day an ugly wart of public 'cloud' hosting crops up. People with remotely interesting workloads should be wary.

    --
    XML is like violence. If it doesn't solve the problem, use more.
  7. DNS caches for 4 days. by Kattare · · Score: 2

    No dns server (or mainstream browser) caches something for 4 days when given a low TTL. I've seen some that cache for a few hours, maybe up to a day, but 4 days? Really? Something else is going on. I kind of wonder about the Netflix clients built into all those TV's, Mobile Phones, and DVD players.

  8. AWS charges based on load by sandytaru · · Score: 1

    So if you're getting millions of requests that aren't actually meant for you, that could drive up your monthly bill as well as your traffic usage. Good thing they caught that...

    --
    Occasionally living proof of the Ballmer peak.
  9. Security is NOT an issue with The Cloud. by Anonymous Coward · · Score: 2, Funny

    Wait a minute. I'm a manager, and I've been reading a lot of case studies and watching a lot of webcasts about The Cloud. Based on all of this glorious marketing literature, I, as a manager, have absolutely no reason to doubt the safety of any data put in The Cloud.

    The case studies all use words like "secure", "MD5", "RSS feeds" and "encryption" to describe the security of The Cloud. I don't know about you, but that sounds damn secure to me! Some Clouds even use SSL and HTTP. That's rock solid in my book.

    And don't forget that you have to use Web Services to access The Cloud. Nothing is more secure than SOA and Web Services, with the exception of perhaps SaaS. But I think that Cloud Services 2.0 will combine the tiers into an MVC-compliant stack that uses SaaS to increase the security and partitioning of the data.

    My main concern isn't with the security of The Cloud, but rather with getting my Indian team to learn all about it so we can deploy some first-generation The Cloud applications and Web Services to provide the ultimate platform upon which we can layer our business intelligence and reporting, because there are still a few verticals that we need to leverage before we can move to The Cloud 2.0.

    1. Re:Security is NOT an issue with The Cloud. by Prosthetic_Lips · · Score: 1

      ... and here I am without any mod points.

      Pretend that I marked you Two Thumbs Way Up!, Mr. PHB.

      PS: For those of you without an irony chip installed ... pretend I started my post with </irony>

    2. Re:Security is NOT an issue with The Cloud. by AlienIntelligence · · Score: 1

      ... and here I am without any mod points.

      Pretend that I marked you Two Thumbs Way Up!, Mr. PHB.

      PS: For those of you without an irony chip installed ... pretend I started my post with </irony>

      Pretend you started your post with Irony, off?

      -AI

      --
      For me, it is far better to grasp the Universe as it really is than to persist in delusion
    3. Re:Security is NOT an issue with The Cloud. by Prosthetic_Lips · · Score: 1

      To end the irony of the previous post ...

    4. Re:Security is NOT an issue with The Cloud. by Galestar · · Score: 1

      me thinks you need to go back to xml school

      --
      AccountKiller
    5. Re:Security is NOT an issue with The Cloud. by ATMAvatar · · Score: 1

      A single tear rolled down my cheek as I compared this satire against real, starry-eyed reactions of my company's management with "the cloud".

      You know, this mythical beast that solves all scalability and maintenance issues while simultaneously having absolutely zero downsides...

      --
      "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
    6. Re:Security is NOT an issue with The Cloud. by Anonymous Coward · · Score: 0

      This was funny the first time.... but I have seen this posted before.

    7. Re:Security is NOT an issue with The Cloud. by marcello_dl · · Score: 1

      cool story, bro - but maybe it was submitted once and some faulty load balancer spread it out.

      --
      ---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
  10. Whodunnit? by autocracy · · Score: 1

    Does this story come with any indication that their isn't a mixup on Netflix's part?

    --
    SIG: HUP
    1. Re:Whodunnit? by autocracy · · Score: 1

      ... "there" isn't a mixup on their part. Honestly, it'd be great if the Slashdot API reacted in the same year that I clicked on preview.

      --
      SIG: HUP
    2. Re:Whodunnit? by pjt33 · · Score: 1

      The preview goes via Netflix.