Google Works Out a Fascinating, Slightly Scary Way For AI To Isolate Voices In a Crowd (arstechnica.com)

← Back to Stories (view on slashdot.org)

Google Works Out a Fascinating, Slightly Scary Way For AI To Isolate Voices In a Crowd (arstechnica.com)

Posted by BeauHD on Friday April 13, 2018 @10:30AM from the cocktail-party-effect dept.

An anonymous reader quotes a report from Ars Technica: Google researchers have developed a deep-learning system designed to help computers better identify and isolate individual voices within a noisy environment. As noted in a post on the company's Google Research Blog this week, a team within the tech giant attempted to replicate the cocktail party effect, or the human brain's ability to focus on one source of audio while filtering out others -- just as you would while talking to a friend at a party. Google's method uses an audio-visual model, so it is primarily focused on isolating voices in videos. The company posted a number of YouTube videos showing the tech in action.

The company says this tech works on videos with a single audio track and can isolate voices in a video algorithmically, depending on who's talking, or by having a user manually select the face of the person whose voice they want to hear. Google says the visual component here is key, as the tech watches for when a person's mouth is moving to better identify which voices to focus on at a given point and to create more accurate individual speech tracks for the length of a video. According to the blog post, the researchers developed this model by gathering 100,000 videos of "lectures and talks" on YouTube, extracting nearly 2,000 hours worth of segments from those videos featuring unobstructed speech, then mixing that audio to create a "synthetic cocktail party" with artificial background noise added. Google then trained the tech to split that mixed audio by reading the "face thumbnails" of people speaking in each video frame and a spectrogram of that video's soundtrack. The system is able to sort out which audio source belongs to which face at a given time and create separate speech tracks for each speaker. Whew.

45 comments

Min score:

Reason:

Sort:

Dueling pundits? by ClickOnThis · 2018-04-13 10:43 · Score: 2

Might be useful for sorting out what political pundits are saying when they try to overspeak each other.

--
If it weren't for deadlines, nothing would be late.
1. Re: Dueling pundits? by TheMeuge · 2018-04-13 11:36 · Score: 2
  
  I think they should actually duel.
2. Re:Dueling pundits? by gnick · 2018-04-13 11:55 · Score: 1
  
  sorting out what political pundits
  That's a terrific idea! I can just mute the voices of people I disagree with! One more brick on my echo chamber.
  
  --
  He's getting rather old, but he's a good mouse.
3. Re:Dueling pundits? by Anonymous Coward · 2018-04-15 22:23 · Score: 0
  
  This will quite useful for the government I expect in conjunction with all the new street cameras that are rolling out that have built in microphones, such as the mobotix ones.
Extra! Extra! Read all about it! by Anonymous Coward · 2018-04-13 10:45 · Score: 0

Helium and Xenon sales skyrocket for voiceprint masking!
Voice prints for ads by AHuxley · 2018-04-13 10:45 · Score: 1

For ads that track everyone back to their own accounts.
Get a mic, webcam placed in a persons home as a must have trendy free network service.
Track who exactly was talking about a dog, cat in a conversation at a friends home.
All their friends are now on file as seperate people to track and create ads for.
The pet food brands now have the resulting digital product lists sold to them.
The resulting ads start and the reaction of the users is tracked.

--
Domestic spying is now "Benign Information Gathering"
Wow by Anonymous Coward · 2018-04-13 10:50 · Score: 0

So everything google develops is useful for mass surveillance.
1. Re:Wow by rogoshen1 · 2018-04-13 11:04 · Score: 1
  
  And the award for the most insightful, succinct post of the day goes to AC. (what a waste)
2. Re:Wow by Highdude702 · 2018-04-13 11:23 · Score: 1
  
  He has a very good point. How long until we live in the United States of Google. Or, The European Google, or Siberian Google, they are trying to take over and brainwash the world. Its like the Futurama Brain Slug episode
3. Re: Wow by brunes69 · 2018-04-13 11:36 · Score: 1
  
  It would also be incredibly useful for transcribing videos for the visually impaired, or for indexing the text.
4. Re: Wow by TheMeuge · 2018-04-13 11:40 · Score: 1
  
  These days that's not a bug, that's a feature. We're done with the digital age. We're into the surveillance age. At least for Winston, he knew the telescreen while likely not be watching everyone all the time. Nowadays, all the screens are always watching... and listening... to everyone.. all the time.
Simplest way which still involves AI by 93+Escort+Wagon · 2018-04-13 10:51 · Score: 2

1) Employ a robot.
2) Instruct the robot to kill the people in the room, one by one, until the target voice is no longer heard.

--
#DeleteChrome
1. Re:Simplest way which still involves AI by grep+-v+'.*'+* · 2018-04-13 20:01 · Score: 1
  
  2) Instruct the robot to kill the people in the room, one by one, until the target voice is no longer heard.
  Dunno -- they might be mute with fear. And they might have been originally using an accent to fool you.. Better kill 'em all to make sure you got the right one.
  
  --
  If the universe is someone's simulation -- does that mean the stars are just stuck pixels?
2. Re:Simplest way which still involves AI by 93+Escort+Wagon · 2018-04-14 07:40 · Score: 1
  
  Good call - I hadn’t thought of that.
  
  --
  #DeleteChrome
So now all those "security" cams by mark_reh · 2018-04-13 10:57 · Score: 1

in public spaces will be able to lip-read conversations.
What harm can come from that?
1. Re:So now all those "security" cams by Anonymous Coward · 2018-04-13 15:40 · Score: 0
  
  Low Earth orbit satellites that can film a million people at once and decode all of their conversations? Beware what you say outside. (And inside, your phone isn't just probably listening, it IS listening.)
Found the speaker by PPH · 2018-04-13 10:58 · Score: 1

It's Charlie McCarthy.

--
Have gnu, will travel.
Dinosaurs will appreciate it by mnemotronic · 2018-04-13 11:27 · Score: 3, Interesting

Not that I need one yet (actually I do), but a hearing aid (not that I need one) that can pull voices out of the background noise of life (not that I can't do that myself) would be really handy. For the people that really need hearing aids. Not me of course.

--
The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
What ever do you think you're doing, Dave? by Anonymous Coward · 2018-04-13 11:30 · Score: 0

If he were proven to be malfunctioning, I don't see any other choice but disconnection.
This is not good by Anonymous Coward · 2018-04-13 11:33 · Score: 0

If they package this in a real-time phone app, people in bars might actually be able to hear each other using a bluetooth earpiece. That would be the end of pickups.
First, they take all of the dating and social apps away, then this?
Next thing they'll have is a pill that makes it so that you can see clearly while drunk. It would be bad enough to listen to them, but to actually see them clearly before morning?
1. Re: This is not good by Anonymous Coward · 2018-04-13 16:53 · Score: 0
  
  Real-time reverse airbrush, running on your Google glasses, coming 2020.
  "The system was trained on 2 million images showing the subjects face at night, and the morning after.."
Keep in mind that it's still in the alpha stage by elrous0 · 2018-04-13 11:47 · Score: 2

So far it's only able to isolate Fran Drescher's voice in a crowd of Amish people. But they're improving it every day.

--
SJW: Someone who has run out of real oppression, and has to fake it.
1. Re:Keep in mind that it's still in the alpha stage by Tony+Isaac · 2018-04-13 13:20 · Score: 1
  
  Once it reaches "beta," it will stay there for the next 10 years!
2. Re:Keep in mind that it's still in the alpha stage by Anonymous Coward · 2018-04-13 15:08 · Score: 0
  
  Once it reaches "beta," it will stay there for the next 10 years!
  Publicly. The one we don't know about will do far more.
  If this exists, it is truly evil and straight out of Cyberpunk Dystopia ... combine this shit with facial recognition, and in theory you could transcribe a room full of people.
  This is that full-on box over your face AI tracking and monitoring you see in some movie, and you roll your eyes at and think "as if we have that kind of technology".
  If this shit goes out of beta, and even if this specific one doesn't, we're pretty much on the threshold of the complete surveillance state. This is kinda tinfoil hat material from the 90s.
  Think I'm joking?
  
  Facial recognition tech picks a suspect out of a crowd of 50,000 in China
  Fuck ...
Independent Component Analysis by nsaspook · 2018-04-13 11:50 · Score: 1

Old news with at least two audio tracks and no video clues.
http://cnl.salk.edu/~tewon/Bli...
Single-channel separation of multiple sources
https://youtu.be/LuBer-0WmpQ

--
In GOD we trust, all others we monitor.
Not just for Muslims anymore by Anonymous Coward · 2018-04-13 11:50 · Score: 1

I can see niqabs becoming a useful garment not just for Muslim women, but also for anyone of either gender who doesn't want Google or our spying corporate overlords to see.
OK GOOGLE. by Anonymous Coward · 2018-04-13 11:51 · Score: 0

Makes you want to stop saying that phase before it's too late.
Stenography (not steganography) by minstrelmike · 2018-04-13 12:35 · Score: 3, Interesting

The reason we still use court reporters at hearings, depositions and trials is because they can distinguish between voices when folks talk over the top of each other. That's why tape/digital recorders haven't replaced live stenographers. Not yet. That might be the real market for this google product: automated production of transcripts from depositions.
1. Re:Stenography (not steganography) by Anonymous Coward · 2018-04-16 05:30 · Score: 0
  
  Yet another job lost to automation! What a time to be alive!
Boo Boooo! by thinkwaitfast · 2018-04-13 12:46 · Score: 1

Why is this scary? Would a machine that could add one thousand numbers in one second be scary to someone in 1965?
1. Re:Boo Boooo! by Zaiff+Urgulbunger · 2018-04-14 05:22 · Score: 1
  
  Why is this scary? Would a machine that could add one thousand numbers in one second be scary to someone in 1965?
  When I first read the headline, my money was on "Google works out a fascinating, slightly scary way for AI to isolate voices in a crowd BY KILLING ALL THE OTHERS".
  
  Disappointingly, that wasn't the explanation, and I feel comedic potential has once again been wasted!
Next addition to Google vans by Anonymous Coward · 2018-04-13 12:51 · Score: 0

will be sets of directional microphones that covers 360 degrees around the van, recording and transcribing every conversation nearby.
Remember all the Google fanbois claiming it was ok for Google to record all wifi traffic received in the vans because the signal was "sent out to public space"? Now, apply the same thinking to the sound waves coming out of people's mouths.
Big brother just got even bigger. Enjoy your life with no privacy.
Hooray for ventriloquism and voice acting by Anonymous Coward · 2018-04-13 12:58 · Score: 0

Hooray for ventriloquism and voice acting/impersonation.
I just do it for fun but I guess it might be time for everybody to start learning.
The elusive Cocktail Party Effect by Anonymous Coward · 2018-04-13 13:21 · Score: 1

I remember clearly the first time I read about the "Cocktail Party Effect", thinking "oh, that must be the headache I get when I go to cocktail parties and try to talk to people, the intense feeling of frustration that makes me hate going to parties."
Imagine my shock when I found out that other people claim to be able to track a conversation partner's voice and understand what they're saying, even in an environment filled with the voices of other people! I refused to believe it. But then, talking to friends and family that I trust, I learned that yes, most people ARE able to do that. I can't. I never have. I'm over 40, and never understood until recently that I completely lack this core, innate ability that most people take for granted.
This has helped tremendously to understand myself, the kind of person I am, the kinds of friendships I maintain, and the kinds of activities I enjoy. I understand, now, that listening to people talk requires my full attention. Which means:
- when people talk to me, they sense that they have my undivided attention, and they like it. My friendships are deep.
- people who talk a lot about nothing or are full of hot air, I learn to ignore (and I mean totally ignore, as if they don't exist) -- they're too much effort
- people who are uncomfortable with silence don't like being around me. That's OK.
- when salespeople talk, I let them do their thing, then ask for the brochure so I can think about it in written form. That's how I avoid being talked into things.
- I avoid parties, and when I go, I plan for a period of recuperation afterward -- kind of like going to the gym
I am guessing that among the tech crowd, there are probably others like me.
I strongly suspect that this "voice tracking" ability is a part of the brain that works well in some people and not in others. Or maybe, got switched off when I fell off a landing onto a concrete patio when I was two years old (yes, really happened)
1. Re: The elusive Cocktail Party Effect by Anonymous Coward · 2018-04-13 18:13 · Score: 0
  
  I can relate, so you're not alone my fellow slashdotter.
2. Re:The elusive Cocktail Party Effect by DontBeAMoran · 2018-04-14 04:15 · Score: 1
  
  Exact same thing here.
  
  --
  #DeleteFacebook
3. Re: The elusive Cocktail Party Effect by Anonymous Coward · 2018-04-14 04:31 · Score: 0
  
  Same here
4. Re:The elusive Cocktail Party Effect by Zaiff+Urgulbunger · 2018-04-14 06:07 · Score: 1
  
  #metoo
What about 70s movies by Anonymous Coward · 2018-04-13 16:35 · Score: 0

Can it distinguish the dialogue from the ear damaging explosions and music. I love that era of film but dear god everyone talks like they are whispering.
beyond panoptic by Reverend+Green · 2018-04-13 17:03 · Score: 1

Big Brother Google is always watching.
Big Brother Google is always listening.
Re: Keep in mind that it's still in the alpha stag by Reverend+Green · 2018-04-13 17:07 · Score: 1

The future has arrived - and it's totalitarian. Hurrah!
Comment removed by account_deleted · 2018-04-13 21:13 · Score: 1

Comment removed based on user account deletion
hum this could be used to aid the deaf too by Anonymous Coward · 2018-04-14 05:35 · Score: 0

Im deaf biggest problem i have is a noisy room. this ai can be linked to a bluetooth phone and aid the deaf to hear the person your speaking too only.
as compared to an ocean of sound i have to pick out a voice and try to do this very taxing task with my mind while lip reading the person at teh same time. conversations are mentally taxing and drain you very much so.
On the negative side alexis(aka wiretap) what did you tell the government about me today?
~N~
Useful for hearing-aid users by Zaiff+Urgulbunger · 2018-04-14 06:04 · Score: 1

I believe hearing aids are no so great in crowded, noisy environments like parties. So this could be really useful for them... except for hearing-aid users now also needing some cameras attached too!
Can't help thinking about .. by RespekMyAthorati · 2018-04-15 11:17 · Score: 1

HAL-9000 watching David Bowman and Frank Poole plotting to take him out.

Didn't end well for them.
Won't end well for us.