AI Hears Your Anger in 1.2 Seconds (venturebeat.com)
MIT Media Lab spinoff Affectiva's neural network, SoundNet, can classify anger from audio data in as little as 1.2 seconds regardless of the speaker's language -- just over the time it takes for humans to perceive anger. From a report: Affectiva's researchers describe it ("Transfer Learning From Sound Representations For Anger Detection in Speech") in a newly published paper [PDF] on the preprint server Arxiv.org. It builds on the company's wide-ranging efforts to establish emotional profiles from both speech and facial data, which this year spawned an AI in-car system codeveloped with Nuance that detects signs of driver fatigue from camera feeds. In December 2017, it launched the Speech API, which uses voice to recognize things like laughing, anger, and other emotions, along with voice volume, tone, speed, and pauses.
SoundNet consists of a convolutional neural network -- a type of neural network commonly applied to analyzing visual imagery -- trained on a video dataset. To get it to recognize anger in speech, the team first sourced a large amount of general audio data -- two million videos, or just over a year's worth -- with ground truth produced by another model. Then, they fine-tuned it with a smaller dataset, IEMOCAP, containing 12 hours of annotated audiovisual emotion data including video, speech, and text transcriptions.
SoundNet consists of a convolutional neural network -- a type of neural network commonly applied to analyzing visual imagery -- trained on a video dataset. To get it to recognize anger in speech, the team first sourced a large amount of general audio data -- two million videos, or just over a year's worth -- with ground truth produced by another model. Then, they fine-tuned it with a smaller dataset, IEMOCAP, containing 12 hours of annotated audiovisual emotion data including video, speech, and text transcriptions.
Anytime Cortana or Siri popscup and gets in the way there will be anger!
If I'm on a call with an automated tree (and I'm sufficiently alone), I often let loose a string of angry "old man" profanity while it's listening just to see if I get get auto-routed to the agent. Hasn't happened too often, but it happens (most often with airlines/creditcards).
Stop calling everything a computer does "AI".. 15 years ago in 2004 is was an IT Director at a large call center that did both inbound (skills based routing) and outbound (predictive dialing). One of the features of our telephone switch back then was real time monitoring that could detect when someone would get agitated or use a "bad " word (like swearing) . When pre-specified thresholds were reached or certain words used, the system would call a supervisor and allow the supervisor to "ghost" (listen but not be heard), "whisper" (coach the agent without being heard by the caller), or take over the call. The terminology 15 years ago was real time monitoring with language recognition heuristics. It worked great then and it was commercially available, it wasn't "AI"...