Amazon's Move Off Oracle Caused Prime Day Outage in One of its Biggest Warehouses, Internal Report Says (cnbc.com)
Amazon is learning how hard it can be to move off of Oracle's database software. From a report: On Prime Day, while the e-retailer was dealing with a major website glitch that slowed sales, the company was also dealing with a technical problem in Ohio at one of its biggest warehouses, leading to thousands of delayed package deliveries, according to an internal report obtained by CNBC. The problem was in large part due to Amazon's migration from Oracle's database to its own technology, the documents show. The outage underscores the challenge Amazon faces as it looks to move completely off Oracle's database by 2020, and how difficult it is to re-create that level of reliability. It also shows that Oracle's database is more efficient in some aspects than Amazon's rival software, a point that Oracle will likely emphasize during this week's annual OpenWorld conference in San Francisco.
https://en.wikipedia.org/wiki/...
They're developing their own technology because of implementing RDS. IIRC, RDS was originally a customized MySQL, and then they implemented Aurora.
Look up Amazon Aurora.
They've basically created new a DBMS that runs on top of their cloud infrastructure and is optimized for their EBS (elastic block storage). They have Postgres and MySQL flavors of the database, both of which utilize the actual DB "engines", Amazon has written their own storage backends and added a bunch of other optimizations to the codebase (they've made most messaging asynchronous where possible). Because of the use of the actual database engines they claim 100% compatibility for both Postgres and MySQL. We use the MySQL flavor and haven't run into any compatibility issues with SQL queries or stored procs. Because of the performance optimizations inherent in how it was designed to run in their cloud, we were able to significantly reduce the amount of CPU/RAM utilized to run our application and still retain similar throughput - in essence, we were able to use a smaller RDS instance size, thus reducing our costs.
One of the really nice things about it is virtually instant (and faultless) replication due to the way they rely on EBS itself to replicate data, rather than through a replication system sending queries (or binary data) to another remote system.
I think most people don't understand that the actual database product is rock solid. It's Oracle middleware that needs to die in a fire. That and their licensing which makes Microsoft look like the good guy. I don't understand how they can make a good dbms but fail so miserably on the middleware. Want a tomcat server that barely works? Get it from Oracle! Otherwise it'll work solid everywhere else.
They'd do a far better job of returning to customers' good graces by not being such totalitarian get-every-last-dime asshats about their licensing terms.
Ever wonder why Oracle was so slow to get any traction in/among virtual machines?
Quo usque tandem abutere, Nimbus, patientia nostra?
Was it just a regular outage that could have happened to anyone, or something very specific to their own infrastructure?
Just because a change was made at some point in the past, you don't get to just assume that everything would have been fine if Change X or Y hadn't been made. Oracle isn't a silver bullet.
I have some contacts at Amazon and can shed some light on this. Normally, Amazon retail prioritizes "Prime Day prep" above all else. Every team must prove they can stand up to the spike in load, and fill out lots of paperworks demonstrating they did adequate diligence. Rumor is that Prime Day was actually started as a way to do this exercise twice a year (and thus get better at it), rather than only for Christmas shopping.
However, this year is different. Moving off Oracle has been made the first priority of every retail team (well, every one that uses Oracle in any way, which is most). No doubt that shift in priorities is what's at play here: given the thousands of teams, it's no surprise that some team somewhere dropped the ball given the conflicting priorities.
So it's less about "Oracle was a silver bullet" and more about "changing stuff you don't usually change".
Socialism: a lie told by totalitarians and believed by fools.