Toyota Acceleration and Embedded System Bugs
An anonymous reader writes "David Cummings, a programmer who worked on the Mars Pathfinder project, has written an interesting editorial in the L.A. Times encouraging Toyota to drop claims of software infallibility in their recent acceleration problems. He argues that embedded systems developers must program more defensively, and that companies should stop relying on software for safety. Quoting: 'If Toyota has indeed tested its software as thoroughly as it says without finding any bugs, my response is simple: Keep trying. Find new ways to instrument the software, and come up with more creative tests. The odds are that there are still bugs in the code, which may or may not be related to unintended acceleration. Until these bugs are identified, how can you be certain they are not related to sudden acceleration?'"
Always going forward.
o hai
Most software is nearly -impossible- to test under flawless conditions. Especially embedded systems with small amounts of CPU power and memory.
Plus, all this hype around these Toyota acceleration problems is just that, hype.
Taxation is legalized theft, no more, no less.
Drive by wire is great and all, but I'd feel much better with a physical fail-safe than their "infallible" software. I am aware of the physical remedies for the issue, but I'd like to see the brake pedal override the accelerator.
Some more data here
Testing only confirms the absence of known bugs. Never forget that.
David Cummings does seem to know what he's talking about, but as it is written, there is some strange logic in the article.
Testing cannot prove the absence of bugs, only their presence. There are two things that do not follow from this:
It sounds to me as if Toyota is saying the former, while Cummings says the latter. Neither is a correct conclusion.
technically that is what part of the update does. It forces the computer to always choose the brake over the accelerator when both pedals are registering. So if the car does accelerate a tap on the brakes should disengage it.
i thought once I was found, but it was only a dream.
I've said time and time again, "Never replace hardware with software" because
something dedicated to the task will always work better, or be less failure
prone (more often than not).
Would Toyota be having these problems with an accelerator cable vs electronic?
99% sure the answer is "no"...heck the solution is add some grease, make sure
it isn't pinched/looped too tightly and/or add tension to the pedal side.
Or, replace the damn cable with a new one...a 20 to 30 minute task.
(less than 10min on a motorcycle)
Oh, well, what do I know? I'm just a CS major with real world experience, pay
no attention to the man behind the keyboard!!!
Have you read the moderator guidelines? Well, have you, PUNK? (and I want a Karma: Gnarly option)
The type of people that purposely hide bugs that will likely kill several people are the same type of people you can't really "appease" no matter what you do.
I'm loving this conversation here because I've gotten crucified in slashdot before for making simmilar comments to the whole thread here. I grew up in a family of top managers of Boeing systems engineers. They hated computers. My dad never even learned how to turn one on. He hired other monkey to use the computers. As A child I was regailed with wonderful stories of every hard lesson in safety my dad had learned over his lifetime. He loved world war II because they got to use cutting edge designs for balls out performance yet at the same time learned how to make things reliable by disecting the accident. He would tell me about the accident that taught them that the engine pumps need to be at full speed but flow stalled on take off so that there's no lag when you hot swap after a pump fails. He told me of the accident where they learned not to route 100% of the control system wiring through any one junction box. etc...
Probably because of all these hard won lessons boeing for years insisted on fully mechanical or hydraulic flight surface controls. Whereas Airbus and other jumped on the fly-by-wire concept early. My dad would spit after hearing some youg person tout all the advantages of fly by wire. He knew them perfectly well. He was big on accepting new innovations to reduce fuel costs and increas performance. He was not a luddite. But he had a safety background that told him these electonic systems were hard as hell to validate and hard as hell to make truly independent from each other.
For example they often used triple redundant computers and if one of them disagreed the other two would vote it off the island and stop listening to it. From what I've read it's now suspected that the latest airbus crash in the pacific had one of it's root problem in the voting nexus where a superior computer over ruled a more primitive safety system.
While we all know that computer software validation is hard if not impossible. It's not something we readily admit here on slash dot. It's because for years people like my dad would throttle the innovations the computer engineeers wanted to implement. I think as a result there became this culture of computer engineers that presented the case that embedded computing could be made safer than it really could be to offset that.
So now we come full circle and have to admit there is this middle ground. Just because a computer can improve perfromance does not mean it's reliable and safe. The old guys had a point after all when it came to safety.
Next week I'll tell you about how the ancient shocking lesson of the British Commet aluminum aircraft wings falling off led to the unanticipated discovery of metal fatigue and probably was the reason Boeing was slow to move to composite materials in commercial aircraft (but not in military aircraft). In hind sight we have heard of many tales of the composite tails of plane falling off as the reason for the loss of control before a crash. Conversely, composite wings on UAVs allow them to absorb a lot of bullet holes with no loss of control and to operate under higher perfromance conditions.
The point is that safety and performance are trade offs when both are pushed to the limit. The old guys know a lot more about safety than you might expect. The young guys are all about performance.
Some drink at the fountain of knowledge. Others just gargle.
Isn't this like proving God doesn't exist?
They can test and test and not get a result that said this is the bug, so they assume that it doesn't exist.
--- Relax, that mass muderer is just trying to reduce our carbon footprint, one fetus at a time...
They should just implement a huge red panic button to shut everything off :)
They could even buy them at Staples. Need to stop your out of control Toyota? That's easy.
Unfortunately the update assumes the computer will actually respond to the brake being pressed or any input for that matter. Toyota doesn't know for certain what is causing all of these sudden acceleration problems in which fiddling with the gas pedal, brake and even putting the vehicle in neutral won't stop the vehicle. The software update, while a sensible modification that should've been in the software all along, is sort of a hail mary toward preventing any new cases in updated vehicles.
> And I know how to hit the brakes...
With the engine past the redline there is very little vacuum to operate the power brakes. Without power assist the brakes may not be able to overcome the engine (this is, IMHO, a fundamental design defect).
> ...shift into neutral...
The computer may not let you do that with the car moving and the engine at high rpm. After all, the engine and/or transmission might be damaged (another design defect).
> ...and/or turn off the key...
Some of these vehicles don't have keys: just a radio remote. The emergency shutdown procedure is to hold a button down for three seconds (another design defect).
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
I used to have a car where the engine would suddenly turn off for no reason while driving, often at exciting moments like getting onto the freeway. It was pretty easy to put it into neutral (it was an automatic), turn the key to "acc" and try to restart the engine (usually with success) without accidentally locking the steering wheel.
It went on for some time until I convinced the repair guys to clean all the electrical connections from the computer to the fuel pump. The car had lived most of it's life in cold places with salty roads, and then the problem appeared in california where mechanics don't think of the effects of salt water. Once the connections were clean it behaved fine.
I worked on an embedded flight system there, and deeply respected people like your dad.
Boeing works under the eye of a certification authority who has to approve the safety of a design including, at least in the system I worked on, human factors. If there's anything comparable for cars, I haven't heard of it.
Boeing would not have made a pilot have to guess at how to turn an engine off (people with older cars, it's no longer a matter of turning a key).
Inputs were checked for consistency and validity. The specs would have anticipated what to do if the accelerator and brake were both full on at the same time.
There was a culture of worst-case planning and redundancy.
Also, if Boeing built a car, it would have a flight data recorder which investigators could examine and say for example "Looks like both(*) potentiometers on the accelerator went hard over at the same time, so we go look on the branches of the fault tree where there's a common-mode failure in the potentiometers or the pedal is down due to mechanical or pilot error".
(*) If I remember correctly from my obsessive pre-purchase research on Priuses, there are two separate sensors for accelerator position.
... why does everyone assume it is a software bug? I agree that it very well could be an undiscovered software bug. But there are so many more sources of erroneous behavior in an embedded system that *even* *if* the software were flawless (ummm... just go with me a minute... :) an automotive environment can cause all manner of strange glitches. I work with robots, lots of DC motors causing commutation noise on the power supply, long (several inch) distances between units that must talk to each other and therefore may have a different opinion as to ground reference voltage... many things can get wacky. Even flawless code needs a watchdog timer to get you out of weird states that power glitches that put you into. Power supply spikes can cause the program counter to jump to very odd places, with odd, corrupted stuff in RAM. Ground level shifting can cause communication glitches. CAN bus is *extremely* robust, so bad data should not get through... but what does get through? Does the system as a whole get into a weird state if packets drop?
Any vehicle built in the last thirty or fourty years will not allow the steering column to lock unless the transmission is in park. If you're in drive (or neutral) you can only turn it to "off", not all the way to "lock". This was to prevent an errant knee from locking the steering while you're doing 70 on the freeway. Happened to me once, except I was only doing 45 on a bumpy ass gravel road when my knee smacked into my keychain. It was startling, but not particularly dangerous.
> With the engine past the redline there is very little vacuum to operate the power brakes. Without power assist the brakes may not be able to overcome the engine
apparently not true
Several years ago, I designed the software for a real-time automotive test system called HP ECUTEST (I think the official name was HP Design Span DS5470, but let's not waste time on HP's cold dead fish naming conventions). It simulated a car from an electric point of view. You connected an electronic control unit (ECU), and it had basically no way to tell it was not in a real car. Think of it as The Matrix for car electronics.
One of our first customers wanted us to test it with a reliable, proven, tested, tried and true ECU, something that was on the road in cars for several years already. So we did. And I noticed something odd. The ECU worked fine when we "drove" a car normally, but at idle, it would basically slow down, one RPM at a time, until it stopped. However, if I changed the value of the input corresponding to the accelerator pedal, it would reset the idle speed to the default, something like 800rpm.
Finally, after eliminating the possible bugs on our side, we tell the customer. Their first reaction was "no way". But after a week and a demo of the problem, they finally made a connection. They had this elusive bug of some car customers complaining that their car would sometimes stop when idle. It turns out that in a real car, chassis vibrations generally caused minute changes in the input value for the accelerator. So the ECU would correctly recompute its idle speed. However, if there was no change, like if the pedal was more rigid than usual, the bug would trigger.
The root cause was a routine that wanted to optimize idle speed to be as low as possible, but for some reason kept cached data if the accelerator had not changed, so it thought the engine was still running smoothly.
We found such bugs in practically all ECUs we tested for the first time. The most impressive one was in a V8 ECU that was basically a V8 until 1200rpm, then a V7, then a V6, and basically a V2 above 4000 rpm. The customer had hoped we'd find something, because they didn't get all the power they expected from the engine. Obviously. It was hard to find without our system, because the injectors that fired were differnt from cycle to cycle, so more simple instrumentation saw all cylinders running. The root cause here was that the software badly exceeded its real-time envelope... Ouch.
-- Did you try Tao3D? http://tao3d.sourceforge.net
I do industrial automation for a living, and machine guarding/safety is a major component of the job. There are now, in the last few years, software based safety products that are provably just as safe as a hardware only safety products. The key is that it's not just about rigorous testing, it's about correct design. If you want category 4 protection, you need to be sure that:
Software becomes another component. Therefore you need to have redundancy in your software. Government regulators that certify these safety systems as compliant want to see you prove that a single component (i.e. unit of software) can't malfunction and leave the system in an unsafe state. What a lot of companies do is they have two independent processors each monitoring the inputs to the system in parallel, and each generating the required outputs. The processors are typically sourced from different companies, and the circuit boards are designed by different teams. The software running on each processor is written by a different team. If both processors agree on the outputs, the system drives those outputs, and if not, all power is dropped to everything and the system can't be restarted (may need to be replaced, etc.).
Those of us in the industry were skeptical of software based safety at first, but given the above facts and a decent amount of regulatory oversight, I'm satisfied that it will live up to the design criteria. That doesn't mean an error can't happen, but it makes the probability low enough that we can live with it.
The latest thing is safety systems running their I/O across networks like DeviceNet and even Ethernet/IP (the IP stands for Industrial Protocol, not Internet Protocol). Again, I was at first skeptical, but they use a protocol layering on top of the network using timestamps and redundant processors on both ends with reasonable failure modes that the system is provably safe, within reasonable limits.
So you can make safe embedded systems, but without being able to inspect the design and see that it lives up to these guidelines, Toyota can't ever *prove* that the system is safe.
"I have never let my schooling interfere with my education." - Mark Twain
I'm a former professional "software tester from hell" (currently unemployed) -- and I drive a 2007 Camry Hybrid.
Officially, my car does not have a problem -- other than the floor mats -- which are now in my trunk and were never a real problem anyways.
Months before all of this publicity, I complained to my dealer about what seemed like a "sticky throttle" during routine maintenance.
The engine continues to run fast for 3-5 seconds after letting up on the gas.
The dealer actually charged me for the extra inspection, but did not find a problem.
So obviously -- I have some concerns.
I doubt that it is a software bug. It didn't start happening until the car was 2 years old.
But who knows ?
Some of those embedded computers have probably been running since the day the Hybrid battery pack was connected in the factory.
Any software running that long could have become unstable.
But without source code, I am powerless to do anything about it.
I have to rely on the word of Toyota's software QA people -- even though I know the current state of the art of software testing is a JOKE !
If Toyota open sourced the code -- I'd have a lot more confidence -- that with a lot more eyes on it -- the software really was OK.
(and if they offered a reward for finding a problem -- I'd be even more confident)
Now for a quick rant as to WHY the current state of software testing is a joke, and why I have little confidence in ANY corporate software QA.
I write this as a former CSTE -- the QAI's "Certified Software Test (Engineer/Expert)".
I also should say that I love software testing because it is the one part of software development where creativity and intuition still play significant role.
And -- it is one area where techniques and standards are still being developed at a significant pace.
Most software development today is 98% boilerplate and copies of stuff somebody else did.
Engineers translate functional specifications into code based on established design patterns.
There are some basic calculations to ensure good response times and scalability.
Software testers typically create test plans from the same set of functional specs that the engineers use.
They simply validate that everything that is supposed to happen, happens.
Then they might run some performance tests -- but only if management budgeted for a test environment for suitable for performance testing.
Then they stop.
Inevitably -- bugs appear in areas that no one ever expected.
Those are fixed later -- and regression tests are added to the test plan.
But -- almost NO ONE EVER LOOKS FOR THOSE "UNEXPECTED" BUGS -- before software is put into production.
Why ?
Because engineers hate the unexpected and don't typically know how to deal with it.
Micro-managed companies following strict Six Sigma processes (like Toyota) don't know how to create a time and resource budget for a "hunt for the unexpected".
The QAI (Quality Assurance Institute) doesn't help either.
They are run by a bunch of engineers obsessed with a desire to precisely measure and quantify every aspect of software testing.
Their techniques are useful, and largely valid -- but if they don't know HOW to quantify something -- they IGNORE it.
Just ask any CSTE -- "How do you test for race conditions" ?
There is no established technique for this, so the QAI simply IGNORES the issue.
There is no mention of race conditions in the CSTE's CBOK (Certified Body of Knowledge).
I used to work for one the the world's top software QA "gurus" and I once asked him how we test for race conditions -- the answer was --
"we don't, because we don't know how to do it".
Despite this -- intermittent race condition bugs account for a huge portion of real-world bugs !
As programmers make more use of multi-core CPUs and GPUS -- race condition bugs are getting to be more and more common.
And yet -- testing for race conditions and testing for "the unexpected" IS actually possible -- it just
The Times also helpfully provides a list of all the people who have died in "sudden acceleration" accidents involving Toyotas:
Toyotas, deaths and sudden acceleration
If you look through the list at the ages mentioned, one begins to notice a rather odd pattern: 18, 21, 32, 34, 44, 45, 47, 56, 57, 58, 60, 61, 63, 66, 68, 72, 72, 77, 79, 83, 85, 89
This is a most peculiar bug indeed in that it seems occur primarily when the driver is elderly. Or perhaps, as with previous "sudden acceleration" scares, this will ultimately turn out to be the result of people slamming on the gas when they menat to slam on the brake and then trying to blame the car for their error.