You're talking about UCC certificates, and yes, they've been around for a while. The problem is browser adoption - there are still waaay too many people using IE6 out there.
..."Apple is rumored to have an exclusive on this technology until 2012."
*shakes head* So much for wide support. Lots more people buy Mac then they used to, but 8 times as many people still buy PCs. Peripheral vendors aren't stupid.
I can't imagine how Macbook shipments would be affected, given the flaw only affected SATA ports beyond the first two. Presuming that SATA devices linked through Thunderbolt don't count either.
Also, when that happens, it won't break the site completely, just cause a delay while the browser attempts to connect over IPv6, fails, then falls back to IPv4. That can take 10-30 seconds depending on the browser, however - far beyond most users' "the site is broken" thresholds.
That's the main reason Google, Facebook and Yahoo are all doing this on the same day - if only one site exhibits this issue, it's easy for a user to assume the problem is with that site. If multiple large sites are problematic, they'll call their ISP who will (hopefully) fix their IPv6 implementation, or (more likely) instruct the user on disabling IPv6 on their computers.
From what I've heard from internal Netflix IT folks, it's extremely dysfunctional in there, which may explain why it's easier for them to just cut a check and have someone else do it.
Now, if someone with IT competency were to purchase Netflix (Google? I mean, their job is to organize the world's information, and movie content *is* information), I'm sure you'd see a swift internal CDN roll out occur.
I've heard similar, along with some highly visible departures in their systems/network engineering groups rumored to be due to exactly those decisions.
Speaking of cutting checks, has anyone noticed that the movies.netflix.com site is actually hosted by Amazon AWS?
macbook:~$ host movies.netflix.com movies.netflix.com is an alias for merchweb-frontend-1502974957.us-east-1.elb.amazonaws.com. merchweb-frontend-1502974957.us-east-1.elb.amazonaws.com has address 204.236.218.69
By all accounts, Level 3 got the contract primarily on price, not features - Akamai is typically the most expensive CDN out there.
IMO Netflix has enough traffic volume that they could build their own CDN and probably pay even less than what they're paying Level 3. Even if it costs more, the internal control over the thing may be worth the delta. Several other large sites have done exactly that already.
The geographic information isn't the issue. It's the fact that there are a very large number of clients using the same pools of DNS resolvers. Akamai uses those resolvers' IP address to map the client to a cache pool; if there are too many requests from the same netblock, they'll all get sent to the same cache pool, overloading it.
At some point, Akamai's load feedback system will notice this and direct users to a different pool, but it's a reactive measurement.
That's assuming that these are the kind of people who actually need to succeed in business in order to make a pile of dough. Lots of serial entrepreneurs do exactly the opposite, living on investment capital from one venture to the next.
Powerboost will only give you a burst if there's available headroom across the data path. If your traffic is going over the flat-topped Tata link, you'll be lucky to get even the 6Mbps you're paying for, much less the burst speed.
This is explained in the post-mortem. Basically, the problem was that clients were reacting to corrupt data being served up by the origin DB cluster the same way that they reacted to bad data coming from the memcached cluster - by deleting the offending entry in memcached and re-sending the query to the origin DB. So a client queried the origin, got bad data, and then deleted the key from memcached - resulting in every other client (tens of thousands of them, most likely) then querying the cluster for the same key* at the same time. Instant meltage ensued.
Now think about what happens when you have tens of thousands of boxes all querying the same cluster for the same keys all at the same time. Some clients will get the answer, but others will get an invalid response back from a melting mysql box. And when that happens, what does the client do? Exactly what started the mess in the first place - it *deletes the key from memcached*. So if any other clients were happily using the cached copy of the key data, they aren't anymore...and back to the origin they go. Lather, rinse, repeat until someone hits the Big Red Button and restarts the whole shebang in a ordered fashion (i.e. only re-activating a few racks at a time).
* More likely, many keys were corrupted on the origin. A single key would only impact one memcached instance and most likely only one mysql server (read about consistent hashing for more detail) and not cause this level of chaos.
It didn't fail, they turned it off. This was the easiest way to "shut off the entire site" as their post-mortem describes. The DNS errors users saw were being generated by the front-end HTTP proxies, not by client browsers, which caused most of this confusion. Once the database issue cleared, they reactivated the DNS entries for the back-end servers one cluster at a time and the site came back.
Easy. They absolutely do use reverse proxies - every large site does, because you just can't scale a web site to Facebook's size without them.
In the post-mortem, they mention the need to effectively "turn off" the entire site, and the easiest way to do that is to remove its DNS. In this case, however, it was most likely more effective to remove the DNS entries for the back-end hosts that the proxies forward queries to, rather than the entries for www.facebook.com. This is most likely what generated the DNS errors that users saw.
A Big Red Button incident knocked livejournal.com offline for 2 days back in 2003. I was working for their colo provider (the owner of said Button) at the time.
I remember a similar scam going on in the dot-com bubble years - a company flush with VC money would "invest" in a shell subsidiary, which would then pay the parent a "licensing fee" for its IP. Thus, investment capital gets transformed into "real" income, and the company can claim to be profitable (or at least a lower cash burn rate).
I actually had the opposite problem - back in 2004 I bought a 5GHz phone system expressly to avoid interfering with my 802.11g wireless. When I went to 802.11n a few years later, I set the base station to 5GHz to avoid interference from all the other 2.5GHz wireless around...and saw the signal go to hell whenever the phone rang.
Not long after that I ditched the landline for good.
I'm guessing that AT&T's assumption here is that tethering users will use more of that 2GB than the average non-tethered user. Again, a hedge against people actually *using* all of the 2GB they're paying for to protect their margins.
That's *exactly* what a CDN is, although generally they're implemented as caching proxies as opposed to true mirrors (i.e. content is pulled into a site the first time it's accessed, then served from the site from that point on). Just about every large web property uses CDNs run by Akamai, Limelight, Internap, Level3, and others, and most the largest sites (Google, Yahoo, et al) operate their own in-house.
DNS is used because *most* of the time, the location of your DNS resolver is a good hint of the client's actual location. There are many cases where this isn't true, which there are other solutions for - one is to redirect the client to a better-located site if the server that gets the request determines that the DNS geolocation was profoundly wrong, another is a proposal from Neustar and Google to embed client IP information in forwarded DNS queries, and have geo-aware resolvers use that information instead of the requesting resolver's IP if available.
Disclaimer: I've worked for two of the companies I've named above. Not telling which ones.:)
- Media from iTunes - Windows software updates - Netflix video on demand - *any* digital media purchased from amazon.com (even DRM-free mp3s) - Images from flickr - boston.com's The Big Picture - Any image I embed in a fark.com comment.
Or you can use DNS for a first guess to the closest site, then use a redirect at the server (which, unlike DNS, sees the real client IP) to correct egregiously bad geo-DNS decisions. This way, a redirect is only done if it's likely that the overhead of the redirect itself will be offset by the faster page load from the "correct" site.
You're talking about UCC certificates, and yes, they've been around for a while. The problem is browser adoption - there are still waaay too many people using IE6 out there.
Anyone know how to right-click-drag on a unibody Macbook Pro to get this to work? Double-tap doesn't seem to do the trick for me.
..."Apple is rumored to have an exclusive on this technology until 2012."
*shakes head* So much for wide support. Lots more people buy Mac then they used to, but 8 times as many people still buy PCs. Peripheral vendors aren't stupid.
I can't imagine how Macbook shipments would be affected, given the flaw only affected SATA ports beyond the first two. Presuming that SATA devices linked through Thunderbolt don't count either.
Kind of like running VMWare inside of another VMWare machine then?
Also, when that happens, it won't break the site completely, just cause a delay while the browser attempts to connect over IPv6, fails, then falls back to IPv4. That can take 10-30 seconds depending on the browser, however - far beyond most users' "the site is broken" thresholds.
That's the main reason Google, Facebook and Yahoo are all doing this on the same day - if only one site exhibits this issue, it's easy for a user to assume the problem is with that site. If multiple large sites are problematic, they'll call their ISP who will (hopefully) fix their IPv6 implementation, or (more likely) instruct the user on disabling IPv6 on their computers.
From what I've heard from internal Netflix IT folks, it's extremely dysfunctional in there, which may explain why it's easier for them to just cut a check and have someone else do it.
Now, if someone with IT competency were to purchase Netflix (Google? I mean, their job is to organize the world's information, and movie content *is* information), I'm sure you'd see a swift internal CDN roll out occur.
I've heard similar, along with some highly visible departures in their systems/network engineering groups rumored to be due to exactly those decisions.
Speaking of cutting checks, has anyone noticed that the movies.netflix.com site is actually hosted by Amazon AWS?
macbook:~$ host movies.netflix.com
movies.netflix.com is an alias for merchweb-frontend-1502974957.us-east-1.elb.amazonaws.com.
merchweb-frontend-1502974957.us-east-1.elb.amazonaws.com has address 204.236.218.69
By all accounts, Level 3 got the contract primarily on price, not features - Akamai is typically the most expensive CDN out there.
IMO Netflix has enough traffic volume that they could build their own CDN and probably pay even less than what they're paying Level 3. Even if it costs more, the internal control over the thing may be worth the delta. Several other large sites have done exactly that already.
The geographic information isn't the issue. It's the fact that there are a very large number of clients using the same pools of DNS resolvers. Akamai uses those resolvers' IP address to map the client to a cache pool; if there are too many requests from the same netblock, they'll all get sent to the same cache pool, overloading it.
At some point, Akamai's load feedback system will notice this and direct users to a different pool, but it's a reactive measurement.
That's assuming that these are the kind of people who actually need to succeed in business in order to make a pile of dough. Lots of serial entrepreneurs do exactly the opposite, living on investment capital from one venture to the next.
Powerboost will only give you a burst if there's available headroom across the data path. If your traffic is going over the flat-topped Tata link, you'll be lucky to get even the 6Mbps you're paying for, much less the burst speed.
This is explained in the post-mortem. Basically, the problem was that clients were reacting to corrupt data being served up by the origin DB cluster the same way that they reacted to bad data coming from the memcached cluster - by deleting the offending entry in memcached and re-sending the query to the origin DB. So a client queried the origin, got bad data, and then deleted the key from memcached - resulting in every other client (tens of thousands of them, most likely) then querying the cluster for the same key* at the same time. Instant meltage ensued.
Now think about what happens when you have tens of thousands of boxes all querying the same cluster for the same keys all at the same time. Some clients will get the answer, but others will get an invalid response back from a melting mysql box. And when that happens, what does the client do? Exactly what started the mess in the first place - it *deletes the key from memcached*. So if any other clients were happily using the cached copy of the key data, they aren't anymore...and back to the origin they go. Lather, rinse, repeat until someone hits the Big Red Button and restarts the whole shebang in a ordered fashion (i.e. only re-activating a few racks at a time).
* More likely, many keys were corrupted on the origin. A single key would only impact one memcached instance and most likely only one mysql server (read about consistent hashing for more detail) and not cause this level of chaos.
It didn't fail, they turned it off. This was the easiest way to "shut off the entire site" as their post-mortem describes. The DNS errors users saw were being generated by the front-end HTTP proxies, not by client browsers, which caused most of this confusion. Once the database issue cleared, they reactivated the DNS entries for the back-end servers one cluster at a time and the site came back.
Easy. They absolutely do use reverse proxies - every large site does, because you just can't scale a web site to Facebook's size without them.
In the post-mortem, they mention the need to effectively "turn off" the entire site, and the easiest way to do that is to remove its DNS. In this case, however, it was most likely more effective to remove the DNS entries for the back-end hosts that the proxies forward queries to, rather than the entries for www.facebook.com. This is most likely what generated the DNS errors that users saw.
The ReacTable...Bjork had one of these on her most recent US tour. Lots of fun to watch in action.
A Big Red Button incident knocked livejournal.com offline for 2 days back in 2003. I was working for their colo provider (the owner of said Button) at the time.
I remember a similar scam going on in the dot-com bubble years - a company flush with VC money would "invest" in a shell subsidiary, which would then pay the parent a "licensing fee" for its IP. Thus, investment capital gets transformed into "real" income, and the company can claim to be profitable (or at least a lower cash burn rate).
Sheet music is basically a one or two person affair, it takes a lot more people (and a lot more equipment) to make an MP3 even for "indie" bands.
Economies of scale. Far, far more people will buy the mp3 than will buy the sheet music.
I actually had the opposite problem - back in 2004 I bought a 5GHz phone system expressly to avoid interfering with my 802.11g wireless. When I went to 802.11n a few years later, I set the base station to 5GHz to avoid interference from all the other 2.5GHz wireless around...and saw the signal go to hell whenever the phone rang.
Not long after that I ditched the landline for good.
I'm guessing that AT&T's assumption here is that tethering users will use more of that 2GB than the average non-tethered user. Again, a hedge against people actually *using* all of the 2GB they're paying for to protect their margins.
RTSP (IETF media streaming protocol) and RTMP (Adobe's proprietary version) support redirects as well.
That's *exactly* what a CDN is, although generally they're implemented as caching proxies as opposed to true mirrors (i.e. content is pulled into a site the first time it's accessed, then served from the site from that point on). Just about every large web property uses CDNs run by Akamai, Limelight, Internap, Level3, and others, and most the largest sites (Google, Yahoo, et al) operate their own in-house.
DNS is used because *most* of the time, the location of your DNS resolver is a good hint of the client's actual location. There are many cases where this isn't true, which there are other solutions for - one is to redirect the client to a better-located site if the server that gets the request determines that the DNS geolocation was profoundly wrong, another is a proposal from Neustar and Google to embed client IP information in forwarded DNS queries, and have geo-aware resolvers use that information instead of the requesting resolver's IP if available.
Disclaimer: I've worked for two of the companies I've named above. Not telling which ones. :)
Sure, if you don't mind not being able to access:
- Media from iTunes
- Windows software updates
- Netflix video on demand
- *any* digital media purchased from amazon.com (even DRM-free mp3s)
- Images from flickr
- boston.com's The Big Picture
- Any image I embed in a fark.com comment.
Or you can use DNS for a first guess to the closest site, then use a redirect at the server (which, unlike DNS, sees the real client IP) to correct egregiously bad geo-DNS decisions. This way, a redirect is only done if it's likely that the overhead of the redirect itself will be offset by the faster page load from the "correct" site.
The same Philip Kaplan that ran F*ckedcompany.com?