Amazon's Move Off Oracle Caused Prime Day Outage in One of its Biggest Warehouses, Internal Report Says (cnbc.com)
Amazon is learning how hard it can be to move off of Oracle's database software. From a report: On Prime Day, while the e-retailer was dealing with a major website glitch that slowed sales, the company was also dealing with a technical problem in Ohio at one of its biggest warehouses, leading to thousands of delayed package deliveries, according to an internal report obtained by CNBC. The problem was in large part due to Amazon's migration from Oracle's database to its own technology, the documents show. The outage underscores the challenge Amazon faces as it looks to move completely off Oracle's database by 2020, and how difficult it is to re-create that level of reliability. It also shows that Oracle's database is more efficient in some aspects than Amazon's rival software, a point that Oracle will likely emphasize during this week's annual OpenWorld conference in San Francisco.
Was it just a regular outage that could have happened to anyone, or something very specific to their own infrastructure?
Just because a change was made at some point in the past, you don't get to just assume that everything would have been fine if Change X or Y hadn't been made. Oracle isn't a silver bullet.
Oracle: Don't you dare change to a competing product. Bad things will happen to you.
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
So the only glitch was a short delay in a single warehouse?
Sounds like a massive success story to me.
That phrase confused me.
I can absolutely understand wanting to move off Oracle. But why would they re-invent the wheel and write their own database? At least, that's what it sounds like they're doing based on the way the article was phrased.
Wouldn't it have been better to just switch to Postgres and use the oracle compatibility layer if they needed things like PL/SQL support?
Ilsa
Between Java and their Enterprise platforms, if Oracle spent as much time listening and responding to their customers as they spent threatening them, they might be in a far better position today. Any major platform transition is going to have problems unless you're exceptionally lucky. There's just too many moving parts in Enterprise systems for humans to get everything right on the first try. Oracle won't tout all of the problems people have moving ONTO their software from a competitor, but that transition pain happens too.
Every year that goes by, it seems like Oracle is in a more tenuous position, despite their increased revenue. They've already lost the SME space -- I don't know of a single company anywhere in our client base, or within my sphere of influence, that still uses Oracle software. Organizations are bumping up against the limits of NetSuite -- the costs to integrate 3rd-party or industry-specific components, compared with other ERPs, are turning out to be more significant than expected. So we have clients and vendors migrating ERPs over time.
Oracle is becoming the Comcast of the software world. They treat everyone like crap, but were so deeply embedded that they were hard to dislodge. With every passing year, that is less true, and I think Oracle knows it. Unfortunately, they seem to be choosing to double-down on the "treat everyone like crap" strategy, rather than actually fixing the systemic problems that might eventually sink them...
Notice: Your mouse has been moved. Windows will now restart so this change can take effect.
I don't understand why Oracle even exists given my experience with it.
Because it's a damn good database. The question isn't about it's capabilities, it's whether it's worth the cost. As for their other products I agree with you; it's way too sluggish. But I believe Amazon was just using their database.
Now Amazon moving away from Oracle is a good thing; as servers get faster and the open source alternatives get better Oracle's database is losing it's foothold. I for one won't be sad to see that happen.
The outage underscores the challenge Amazon faces as it looks to move completely off Oracle's database by 2020, and how difficult it is to re-create that level of reliability. It also shows that Oracle's database is more efficient in some aspects than Amazon's rival software, a point that Oracle will likely emphasize during this week's annual OpenWorld conference in San Francisco.
Nothing in the article really supports those conclusions.
Was it due to some actual inferiority in "their own technology" (postgresql?), or was it just a migration issue?
Comment removed based on user account deletion
Oracle is a complete nightmare. I've ported several large databases off Oracle, and have spent to many years developing using Oracle. There were constant issues with Oracle. Reliable, please. Every month we were running into open bugs and submitting issues. All while paying obscene money for the privilege to use their products
I think most people don't understand that the actual database product is rock solid. It's Oracle middleware that needs to die in a fire. That and their licensing which makes Microsoft look like the good guy. I don't understand how they can make a good dbms but fail so miserably on the middleware. Want a tomcat server that barely works? Get it from Oracle! Otherwise it'll work solid everywhere else.
It's certainly unsurpassed in the efficient manner in which it eats all available IT funding. What licensing scheme are they using to rip off their customers this year? By CPU cores? By clock speed? Both?
Amazon could, obviously, have done a better job of testing before flipping the switch on a migration this big. It's not like the company is hurting for the money that could have been used to put together an appropriate environment to prevent a snafu like this.
CUR ALLOC 20195.....5804M
Likely as well, the $90K that this incident cost them is a rounding error in the total budget of the project, and the long term savings that the project will provide over the years, and additional monies coming in due to being able to now sell this as a services on their AWS platform.
I am sure Amazon probably looses more money per year, maybe even month due do damages of product in shipment than this little mishap cost them.
Anyone who expected otherwise has not done a major migration. But once the move off of Oracle is complete, Amazon may be in a much better place.
I think most people don't understand that the actual database product is rock solid.
You're right we don't understand that because we know better.
Amazon having trouble rolling out a platform migration does not mean Oracle is a reliable platform. On the contrary, my experience is that due to the high licensing costs, many business forego implementing the replication and redundancy measures needed to make Oracle's db reliable. Amazon having trouble rolling out a platform migration only goes to show that scale makes such migrations difficult and underscore how important planning is in IT.
I feel like sometimes the IT department must not be shoveling enough coal into the boiler or something beacuse this antiquated inflexible interface just stalls all the time
Ok, so imagine that, but worse. That was Prime Day. Hours on hours of not stalling, but simply not working at all.
What you are describing sounds like maybe the devs aren't as good as they could be at optimizing, or maybe the company is stingy on hardware. What happened to Amazon was a world-class system brought to a halt simply because of too many users and the system fell over. That is something that Oracle is just better at handling (when it's administered right and has some powerful hardware at work, which Amazon has in spades for anything they stand up).
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Big databases usually require careful tuning to handle big loads. Could it be the new incarnation has yet to undergo such tuning? The new incarnation may also have a different trade-off profile such that the porting process moved operations mostly as-is instead of rebalance the trade-offs to fit the new host. Much of the Oracle DB tuning may be direct production experience, something the new incarnation won't have by definition.
For a car analogy, suppose you are used to hauling big loads up the mountain in a Ford pickup truck. You switch to a Chevy truck and find your productivity drops. At first you blame the Chevy.
After weeks of experience you find the Chevy less powerful at directly going over boulders; however, it's more maneuverable than the Ford such that you just learn to swerve around boulders instead of try to go over them. Once you get used to the Chevy, the haul time is roughly the same.
Table-ized A.I.
I think most people don't understand that the actual database product is rock solid. It's Oracle middleware that needs to die in a fire. That and their licensing which makes Microsoft look like the good guy. I don't understand how they can make a good dbms but fail so miserably on the middleware.
The bulk of Oracle DB was made in the past, at a time when Oracle the company actually employed talented engineers, designers, and programmers.
It really was built to be rock solid and with plenty of features to make heavy workloads a breeze.
Sadly that time has long since past and is not the Oracle the company of today.
A large portion of their middleware was either a 3rd party acquisition they purchased and had their off shore code monkeys try to integrate, or was actually made by said offshore code monkeys, but in either case done so poorly and haphazardly it's a wonder they even run let alone expect to work well.
You know how Sun Microsystems made some amazing tech, and then was bought by Oracle?
You can almost think of Oracle DB as being a product made by an outside company such as "Old Oracle", that was purchased up by "Current Oracle" and fucked up like everything else they touch.
Oracle the company, of the past, actually had a sizable employee base of talent and those people put it to work.
Oracle the company of today is, last I heard, about 90% sales and lawyers in licensing, and 10% overhead. Their technical staff doesn't even round up to 1% as the vast majority is done by outside consultants and outsourced offshore code farms.
$90K is likely similar to what the Oracle license costs them per day. If you think I'm joking, that's $30M/year - which wouldn't surprise me for a company the size of Amazon.
Do you have ESP?
Larry Ellison taunts Amazon that they still use Oracle and can't do without them, thus ensuring that Amazon will stop at nothing to be rid of Oracle and him.
When all you have is a hammer, every problem starts to look like a thumb.
Ellison made it personal like an idiot. Now Amazon doesn't care about the expense any more. And obviously, if Amazon can use AWS instead of Oracle then other companies can too, so Amazon thanks Larry for providing that extra motivation to just do it.
When all you have is a hammer, every problem starts to look like a thumb.
Sure would be nice to hear from somebody who has worked with both, whether Postgres really can fill the Oracle boots. I only know about the Oracle apps, somehow popular in enterprise but universally hated. Absolute rubbish. So why am I supposed to believe that Oracle's other products are magically better?
When all you have is a hammer, every problem starts to look like a thumb.
Not for long.
Not for long.
Not for long.
No doubt, forever and ever.
Do you mean -1 shilling?
It is really easy to screw up your Oracle database server. It's practically an operating system in itself, and there are multiple resource pools that, improperly managed, can starve various back end processes your DBA has barely even heard of. That said, properly managed it should handle heavy workloads for the iron you're running it on.
This is why Oracle *doesn't* make sense for a lot of installations. You need DBAs who either have a great deal of arcane Oracle server management knowledge, or who have the sense not to monkey with stuff they don't understand. Either way, you're talking about someone who can command a higher salary than many organizations are willing to pay for such an unglamorous position.
and very frequently has to go down for some sort of synchronization.
This sounds like a lame excuse to me. The one thing that justifies paying Oracle it's pound of flesh is having a database server that keeps processing transactions, come hell or high water. That's because Oracle does transaction isolation better than anyone else. You never have to worry about stale reads or read locks or any of that kind of rigmarole, nor do you have to give up data consistency to get there. You never have to bring the service off line to back it up or restore or even restore parts of it. You can even pick and choose individual transactions or groups of transactions to roll back all while the database chugs merrily along, accepting new updates.
Oracle exploited, very early on, "copy-on-write" technology . Although back in the day they were pretty tight lipped about how they isolated various data reading and writing processes from each other, with a more modern perspective it's clear they make extensive use of C-O-W snapshotting under the covers.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Note that the cost isn't just monetary. If you buy Oracle, you will forever have to fear their licensing antics. You never know when an audit might happen, and the licensing terms are so convoluted that you're likely in breach. Just to make it worse, the terms constantly change.
Finally! A year of moderation! Ready for 2019?
I've been working with Oracle databases for a couple of decades now.
"rock solid" is an extremely good description of them.
They're fucking expensive and some of the configuration is a royal pain in the arse but they work, they work well and they keep working.
I wouldn't recommend anybody starting a business to actually use one, but that's completely and entirely due to cost and Oracle's business practices, and fuck all to do with the underlying technology.
$30m/year could go just on Oracle Financials at their scale, let alone the database.
Wow. Why would you say something like that publicly? "Yeah our customers want to leave, but we got 'em by the balls!"
Because it's a damn good database. The question isn't about it's capabilities
Actually, it is. The Oracle vs. Google lawsuit was about Oracle's wanting to use Java patents to hammer Google into cross-licensing its map-reduce patents so that Oracle could scale to the levels demanded by customers like Amazon. Cringely had a leaker years back confirming this.
Google won that one, and now Amazon has broken free of Oracle.
Personally I like it that my Subscribe-and-Save stopped taking 3 minutes to update an order. That was a scaling problem that bled through to the UI.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Oh, look, a 'news' article paid for by Oracle.
On a long enough timeline, the survival rate for everyone drops to zero.
Plus management doesn't say, "well, we're already paying Oracle, let's use their garbage product over here too."
OEM, I'm pointing at you....
Cheap storage VM.