Slashdot Mirror


Amazon's Move Off Oracle Caused Prime Day Outage in One of its Biggest Warehouses, Internal Report Says (cnbc.com)

Amazon is learning how hard it can be to move off of Oracle's database software. From a report: On Prime Day, while the e-retailer was dealing with a major website glitch that slowed sales, the company was also dealing with a technical problem in Ohio at one of its biggest warehouses, leading to thousands of delayed package deliveries, according to an internal report obtained by CNBC. The problem was in large part due to Amazon's migration from Oracle's database to its own technology, the documents show. The outage underscores the challenge Amazon faces as it looks to move completely off Oracle's database by 2020, and how difficult it is to re-create that level of reliability. It also shows that Oracle's database is more efficient in some aspects than Amazon's rival software, a point that Oracle will likely emphasize during this week's annual OpenWorld conference in San Francisco.

25 of 130 comments (clear)

  1. Really? by willaien · · Score: 4, Insightful

    Was it just a regular outage that could have happened to anyone, or something very specific to their own infrastructure?

    Just because a change was made at some point in the past, you don't get to just assume that everything would have been fine if Change X or Y hadn't been made. Oracle isn't a silver bullet.

    1. Re:Really? by Mr+D+from+63 · · Score: 4, Insightful

      Was it just a regular outage that could have happened to anyone, or something very specific to their own infrastructure?

      Just because a change was made at some point in the past, you don't get to just assume that everything would have been fine if Change X or Y hadn't been made. Oracle isn't a silver bullet.

      This, and the obvious risk of issues anytime you make such a large change. You fix them and move on. "thousands of delayed packages" sounds like a blip for Amazon. Bad weather can do that.

    2. Re:Really? by GameboyRMH · · Score: 4, Funny

      Oracle is a silver bullet if your wallet is made from werewolf fur!

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    3. Re:Really? by lgw · · Score: 5, Informative

      Was it just a regular outage that could have happened to anyone, or something very specific to their own infrastructure?

      Just because a change was made at some point in the past, you don't get to just assume that everything would have been fine if Change X or Y hadn't been made. Oracle isn't a silver bullet.

      I have some contacts at Amazon and can shed some light on this. Normally, Amazon retail prioritizes "Prime Day prep" above all else. Every team must prove they can stand up to the spike in load, and fill out lots of paperworks demonstrating they did adequate diligence. Rumor is that Prime Day was actually started as a way to do this exercise twice a year (and thus get better at it), rather than only for Christmas shopping.

      However, this year is different. Moving off Oracle has been made the first priority of every retail team (well, every one that uses Oracle in any way, which is most). No doubt that shift in priorities is what's at play here: given the thousands of teams, it's no surprise that some team somewhere dropped the ball given the conflicting priorities.

      So it's less about "Oracle was a silver bullet" and more about "changing stuff you don't usually change".

      --
      Socialism: a lie told by totalitarians and believed by fools.
  2. Bad things will happen to you! by dj245 · · Score: 5, Funny

    Oracle: Don't you dare change to a competing product. Bad things will happen to you.

    --
    Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
    1. Re:Bad things will happen to you! by ilsaloving · · Score: 5, Funny

      Apparently we need a +1 Ominous moderation.

    2. Re:Bad things will happen to you! by xxxJonBoyxxx · · Score: 3, Insightful

      The article proves that the short-term pain of dumping Oracle IS worth the gain.

      >> thousands of delayed package deliveries

      Leading to what...maybe $100K's of losses at a ridiculously inflated top-end? Vs. $100,000K's of savings from not having to write Oracle checks? I think that's a trade-off any smart business would take.

  3. Their own technology? by ilsaloving · · Score: 2

    That phrase confused me.

    I can absolutely understand wanting to move off Oracle. But why would they re-invent the wheel and write their own database? At least, that's what it sounds like they're doing based on the way the article was phrased.

    Wouldn't it have been better to just switch to Postgres and use the oracle compatibility layer if they needed things like PL/SQL support?

    Ilsa

    1. Re:Their own technology? by jeff4747 · · Score: 5, Informative

      https://en.wikipedia.org/wiki/...

      They're developing their own technology because of implementing RDS. IIRC, RDS was originally a customized MySQL, and then they implemented Aurora.

    2. Re:Their own technology? by Hulfs · · Score: 5, Informative

      Look up Amazon Aurora.

      They've basically created new a DBMS that runs on top of their cloud infrastructure and is optimized for their EBS (elastic block storage). They have Postgres and MySQL flavors of the database, both of which utilize the actual DB "engines", Amazon has written their own storage backends and added a bunch of other optimizations to the codebase (they've made most messaging asynchronous where possible). Because of the use of the actual database engines they claim 100% compatibility for both Postgres and MySQL. We use the MySQL flavor and haven't run into any compatibility issues with SQL queries or stored procs. Because of the performance optimizations inherent in how it was designed to run in their cloud, we were able to significantly reduce the amount of CPU/RAM utilized to run our application and still retain similar throughput - in essence, we were able to use a smaller RDS instance size, thus reducing our costs.

      One of the really nice things about it is virtually instant (and faultless) replication due to the way they rely on EBS itself to replicate data, rather than through a replication system sending queries (or binary data) to another remote system.

  4. I think Oracle sees the writing on the wall... by Darlok · · Score: 5, Interesting

    Between Java and their Enterprise platforms, if Oracle spent as much time listening and responding to their customers as they spent threatening them, they might be in a far better position today. Any major platform transition is going to have problems unless you're exceptionally lucky. There's just too many moving parts in Enterprise systems for humans to get everything right on the first try. Oracle won't tout all of the problems people have moving ONTO their software from a competitor, but that transition pain happens too.

    Every year that goes by, it seems like Oracle is in a more tenuous position, despite their increased revenue. They've already lost the SME space -- I don't know of a single company anywhere in our client base, or within my sphere of influence, that still uses Oracle software. Organizations are bumping up against the limits of NetSuite -- the costs to integrate 3rd-party or industry-specific components, compared with other ERPs, are turning out to be more significant than expected. So we have clients and vendors migrating ERPs over time.

    Oracle is becoming the Comcast of the software world. They treat everyone like crap, but were so deeply embedded that they were hard to dislodge. With every passing year, that is less true, and I think Oracle knows it. Unfortunately, they seem to be choosing to double-down on the "treat everyone like crap" strategy, rather than actually fixing the systemic problems that might eventually sink them...

    --
    Notice: Your mouse has been moved. Windows will now restart so this change can take effect.
    1. Re:I think Oracle sees the writing on the wall... by ctilsie242 · · Score: 5, Interesting

      The funny thing is that Oracle could get back into many peoples' good graces. If they offered ZFS under the GPL and allowed it to become part of the default Linux kernel, this would be one of the biggest enterprise issues that would get solved.

      Similar if they opened up a lot of their Solaris IP, instead of letting it die a slow death. Zones and LDOMs would be quite useful in Linux, even with it duplicating existing hypervisor functionality.

    2. Re:I think Oracle sees the writing on the wall... by PolygamousRanchKid+ · · Score: 2

      Between Java and their Enterprise platforms, if Oracle spent as much time listening and responding to their customers as they spent threatening them, they might be in a far better position today.

      Maybe Oracle needs one of those "Codes of Conduct", that seem to be the rage these days . . . ?

      Listening to customers is for startups . . . not for established market leaders. Their market dominance leads them to believe that their customers must listen to them.

      --
      Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
    3. Re:I think Oracle sees the writing on the wall... by sjames · · Score: 3, Insightful

      Oracle has simply overplayed their hand. For years, they have used the intrinsic difficulty of migrating as a tool to keep customers on-board in spite of constant abuse.

      They finally tightened the thumb screws one turn too tight and their customers have decided that the intrinsic pain of migration is less than the pain of staying with Oracle.

    4. Re:I think Oracle sees the writing on the wall... by Penguinisto · · Score: 3, Informative

      They'd do a far better job of returning to customers' good graces by not being such totalitarian get-every-last-dime asshats about their licensing terms.

      Ever wonder why Oracle was so slow to get any traction in/among virtual machines?

      --
      Quo usque tandem abutere, Nimbus, patientia nostra?
  5. Re:MongoDB is webscale by AlanBDee · · Score: 5, Insightful

    I don't understand why Oracle even exists given my experience with it.

    Because it's a damn good database. The question isn't about it's capabilities, it's whether it's worth the cost. As for their other products I agree with you; it's way too sluggish. But I believe Amazon was just using their database.

    Now Amazon moving away from Oracle is a good thing; as servers get faster and the open source alternatives get better Oracle's database is losing it's foothold. I for one won't be sad to see that happen.

  6. Comment removed by account_deleted · · Score: 5, Insightful

    Comment removed based on user account deletion

  7. I don't believe this for a second. by stevenfuzz · · Score: 2

    Oracle is a complete nightmare. I've ported several large databases off Oracle, and have spent to many years developing using Oracle. There were constant issues with Oracle. Reliable, please. Every month we were running into open bugs and submitting issues. All while paying obscene money for the privilege to use their products

  8. Re:MongoDB is webscale by Anonymous Coward · · Score: 2, Informative

    I think most people don't understand that the actual database product is rock solid. It's Oracle middleware that needs to die in a fire. That and their licensing which makes Microsoft look like the good guy. I don't understand how they can make a good dbms but fail so miserably on the middleware. Want a tomcat server that barely works? Get it from Oracle! Otherwise it'll work solid everywhere else.

  9. Re:MongoDB is webscale by WaffleMonster · · Score: 2

    I think most people don't understand that the actual database product is rock solid.

    You're right we don't understand that because we know better.

  10. Re: Prime Day was worse by Anonymous Coward · · Score: 2, Insightful

    "What happened to Amazon was a world-class system brought to a halt simply because of too many users and the system fell over. That is something that Oracle is just better at handling (when it's administered right and has some powerful hardware at work, which Amazon has in spades for anything they stand up)."

    You seem to have not read the articles about Prime day, such as:

    https://www.cnbc.com/2018/07/19/amazon-internal-documents-what-caused-prime-day-crash-company-scramble.html

    Sable is:
    - Is not an RDBMS
    - Is not AWS technology and not used by AWS directly AFAIK
    - Was apparently not scaled up (on their EC2 instances) sufficiently for new (since prwvious peak loads) amazon.com features that use Sable

    Oracle databases cause many outages in Amazon every year; many internal systems that rely on Amazon have either been replaced with new systems that were designed for scalable services AWS offers (and are now much more responsive and can offer modern features, and arw morw stable), or are being migrated off Oracle because it's impossibly expensive to scale Oracle.

    Many amazon.com teams have a lot of experience with Oracle and there is good tooling inside Amazon (that you don't get with Oracle btw.), for momitoring Oracle. The teams in question may just not have that much experience with Aurora/Postgresql, and their own tools and dashboards may not have been updated sufficiently after switching to be able to mitigate as easily as before.

    This doesn't necessarily imply that Aurora is worse than Oracle in any way, it's just dufferent.

    The article here comes across like saying Mac OS is worse than Windows because Outlook on Mac OS doesn't have Auto-Archive.

  11. Outright slow or lack of tuning? by Tablizer · · Score: 2

    It also shows that Oracle's database is more efficient in some aspects than Amazon's rival software

    Big databases usually require careful tuning to handle big loads. Could it be the new incarnation has yet to undergo such tuning? The new incarnation may also have a different trade-off profile such that the porting process moved operations mostly as-is instead of rebalance the trade-offs to fit the new host. Much of the Oracle DB tuning may be direct production experience, something the new incarnation won't have by definition.

    For a car analogy, suppose you are used to hauling big loads up the mountain in a Ford pickup truck. You switch to a Chevy truck and find your productivity drops. At first you blame the Chevy.

    After weeks of experience you find the Chevy less powerful at directly going over boulders; however, it's more maneuverable than the Ford such that you just learn to swerve around boulders instead of try to go over them. Once you get used to the Chevy, the haul time is roughly the same.

  12. Re:"Oracle's database is more efficient" by PincushionMan · · Score: 3, Interesting

    Don't forgot, their new Java licensing scheme: Per physical core on the server side, and also by named user on the client side. $10 each. Yes, even if all the users use the workstation in shifts, they want to be paid 3 times or more. Combine that with the rapid deprecation of features (JavaFX, Java Web Start), and the Chrome catching version numbering scheme, and you have a recipe for disaster if you choose Java for any projects today. In fact, if you've done any development in Java, now might be the time to investigate alternative cross-platform technologies, like .NET.

    I cannot believe I just recommended .NET over Java. What's the world coming to? So, for clarification, is there any possibility that MS could pull an Oracle with .NET?

  13. Re:MongoDB is webscale by Anonymous Coward · · Score: 2, Interesting

    I think most people don't understand that the actual database product is rock solid. It's Oracle middleware that needs to die in a fire. That and their licensing which makes Microsoft look like the good guy. I don't understand how they can make a good dbms but fail so miserably on the middleware.

    The bulk of Oracle DB was made in the past, at a time when Oracle the company actually employed talented engineers, designers, and programmers.
    It really was built to be rock solid and with plenty of features to make heavy workloads a breeze.

    Sadly that time has long since past and is not the Oracle the company of today.

    A large portion of their middleware was either a 3rd party acquisition they purchased and had their off shore code monkeys try to integrate, or was actually made by said offshore code monkeys, but in either case done so poorly and haphazardly it's a wonder they even run let alone expect to work well.

    You know how Sun Microsystems made some amazing tech, and then was bought by Oracle?
    You can almost think of Oracle DB as being a product made by an outside company such as "Old Oracle", that was purchased up by "Current Oracle" and fucked up like everything else they touch.

    Oracle the company, of the past, actually had a sizable employee base of talent and those people put it to work.
    Oracle the company of today is, last I heard, about 90% sales and lawyers in licensing, and 10% overhead. Their technical staff doesn't even round up to 1% as the vast majority is done by outside consultants and outsourced offshore code farms.

  14. Re:MongoDB is webscale by hey! · · Score: 2

    It is really easy to screw up your Oracle database server. It's practically an operating system in itself, and there are multiple resource pools that, improperly managed, can starve various back end processes your DBA has barely even heard of. That said, properly managed it should handle heavy workloads for the iron you're running it on.

    This is why Oracle *doesn't* make sense for a lot of installations. You need DBAs who either have a great deal of arcane Oracle server management knowledge, or who have the sense not to monkey with stuff they don't understand. Either way, you're talking about someone who can command a higher salary than many organizations are willing to pay for such an unglamorous position.

    and very frequently has to go down for some sort of synchronization.

    This sounds like a lame excuse to me. The one thing that justifies paying Oracle it's pound of flesh is having a database server that keeps processing transactions, come hell or high water. That's because Oracle does transaction isolation better than anyone else. You never have to worry about stale reads or read locks or any of that kind of rigmarole, nor do you have to give up data consistency to get there. You never have to bring the service off line to back it up or restore or even restore parts of it. You can even pick and choose individual transactions or groups of transactions to roll back all while the database chugs merrily along, accepting new updates.

    Oracle exploited, very early on, "copy-on-write" technology . Although back in the day they were pretty tight lipped about how they isolated various data reading and writing processes from each other, with a more modern perspective it's clear they make extensive use of C-O-W snapshotting under the covers.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.