Google Works Out a Fascinating, Slightly Scary Way For AI To Isolate Voices In a Crowd (arstechnica.com)
An anonymous reader quotes a report from Ars Technica: Google researchers have developed a deep-learning system designed to help computers better identify and isolate individual voices within a noisy environment. As noted in a post on the company's Google Research Blog this week, a team within the tech giant attempted to replicate the cocktail party effect, or the human brain's ability to focus on one source of audio while filtering out others -- just as you would while talking to a friend at a party. Google's method uses an audio-visual model, so it is primarily focused on isolating voices in videos. The company posted a number of YouTube videos showing the tech in action.
The company says this tech works on videos with a single audio track and can isolate voices in a video algorithmically, depending on who's talking, or by having a user manually select the face of the person whose voice they want to hear. Google says the visual component here is key, as the tech watches for when a person's mouth is moving to better identify which voices to focus on at a given point and to create more accurate individual speech tracks for the length of a video. According to the blog post, the researchers developed this model by gathering 100,000 videos of "lectures and talks" on YouTube, extracting nearly 2,000 hours worth of segments from those videos featuring unobstructed speech, then mixing that audio to create a "synthetic cocktail party" with artificial background noise added. Google then trained the tech to split that mixed audio by reading the "face thumbnails" of people speaking in each video frame and a spectrogram of that video's soundtrack. The system is able to sort out which audio source belongs to which face at a given time and create separate speech tracks for each speaker. Whew.
The company says this tech works on videos with a single audio track and can isolate voices in a video algorithmically, depending on who's talking, or by having a user manually select the face of the person whose voice they want to hear. Google says the visual component here is key, as the tech watches for when a person's mouth is moving to better identify which voices to focus on at a given point and to create more accurate individual speech tracks for the length of a video. According to the blog post, the researchers developed this model by gathering 100,000 videos of "lectures and talks" on YouTube, extracting nearly 2,000 hours worth of segments from those videos featuring unobstructed speech, then mixing that audio to create a "synthetic cocktail party" with artificial background noise added. Google then trained the tech to split that mixed audio by reading the "face thumbnails" of people speaking in each video frame and a spectrogram of that video's soundtrack. The system is able to sort out which audio source belongs to which face at a given time and create separate speech tracks for each speaker. Whew.
Might be useful for sorting out what political pundits are saying when they try to overspeak each other.
If it weren't for deadlines, nothing would be late.
Helium and Xenon sales skyrocket for voiceprint masking!
For ads that track everyone back to their own accounts.
Get a mic, webcam placed in a persons home as a must have trendy free network service.
Track who exactly was talking about a dog, cat in a conversation at a friends home.
All their friends are now on file as seperate people to track and create ads for.
The pet food brands now have the resulting digital product lists sold to them.
The resulting ads start and the reaction of the users is tracked.
Domestic spying is now "Benign Information Gathering"
So everything google develops is useful for mass surveillance.
1) Employ a robot.
2) Instruct the robot to kill the people in the room, one by one, until the target voice is no longer heard.
#DeleteChrome
in public spaces will be able to lip-read conversations.
What harm can come from that?
It's Charlie McCarthy.
Have gnu, will travel.
Not that I need one yet (actually I do), but a hearing aid (not that I need one) that can pull voices out of the background noise of life (not that I can't do that myself) would be really handy. For the people that really need hearing aids. Not me of course.
The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
If he were proven to be malfunctioning, I don't see any other choice but disconnection.
If they package this in a real-time phone app, people in bars might actually be able to hear each other using a bluetooth earpiece. That would be the end of pickups.
First, they take all of the dating and social apps away, then this?
Next thing they'll have is a pill that makes it so that you can see clearly while drunk. It would be bad enough to listen to them, but to actually see them clearly before morning?
So far it's only able to isolate Fran Drescher's voice in a crowd of Amish people. But they're improving it every day.
SJW: Someone who has run out of real oppression, and has to fake it.
Old news with at least two audio tracks and no video clues.
http://cnl.salk.edu/~tewon/Bli...
Single-channel separation of multiple sources
https://youtu.be/LuBer-0WmpQ
In GOD we trust, all others we monitor.
I can see niqabs becoming a useful garment not just for Muslim women, but also for anyone of either gender who doesn't want Google or our spying corporate overlords to see.
Makes you want to stop saying that phase before it's too late.
The reason we still use court reporters at hearings, depositions and trials is because they can distinguish between voices when folks talk over the top of each other. That's why tape/digital recorders haven't replaced live stenographers. Not yet. That might be the real market for this google product: automated production of transcripts from depositions.
Why is this scary? Would a machine that could add one thousand numbers in one second be scary to someone in 1965?
will be sets of directional microphones that covers 360 degrees around the van, recording and transcribing every conversation nearby.
Remember all the Google fanbois claiming it was ok for Google to record all wifi traffic received in the vans because the signal was "sent out to public space"? Now, apply the same thinking to the sound waves coming out of people's mouths.
Big brother just got even bigger. Enjoy your life with no privacy.
Hooray for ventriloquism and voice acting/impersonation.
I just do it for fun but I guess it might be time for everybody to start learning.
I remember clearly the first time I read about the "Cocktail Party Effect", thinking "oh, that must be the headache I get when I go to cocktail parties and try to talk to people, the intense feeling of frustration that makes me hate going to parties."
Imagine my shock when I found out that other people claim to be able to track a conversation partner's voice and understand what they're saying, even in an environment filled with the voices of other people! I refused to believe it. But then, talking to friends and family that I trust, I learned that yes, most people ARE able to do that. I can't. I never have. I'm over 40, and never understood until recently that I completely lack this core, innate ability that most people take for granted.
This has helped tremendously to understand myself, the kind of person I am, the kinds of friendships I maintain, and the kinds of activities I enjoy. I understand, now, that listening to people talk requires my full attention. Which means:
- when people talk to me, they sense that they have my undivided attention, and they like it. My friendships are deep.
- people who talk a lot about nothing or are full of hot air, I learn to ignore (and I mean totally ignore, as if they don't exist) -- they're too much effort
- people who are uncomfortable with silence don't like being around me. That's OK.
- when salespeople talk, I let them do their thing, then ask for the brochure so I can think about it in written form. That's how I avoid being talked into things.
- I avoid parties, and when I go, I plan for a period of recuperation afterward -- kind of like going to the gym
I am guessing that among the tech crowd, there are probably others like me.
I strongly suspect that this "voice tracking" ability is a part of the brain that works well in some people and not in others. Or maybe, got switched off when I fell off a landing onto a concrete patio when I was two years old (yes, really happened)
Can it distinguish the dialogue from the ear damaging explosions and music. I love that era of film but dear god everyone talks like they are whispering.
Big Brother Google is always watching.
Big Brother Google is always listening.
The future has arrived - and it's totalitarian. Hurrah!
Comment removed based on user account deletion
Im deaf biggest problem i have is a noisy room. this ai can be linked to a bluetooth phone and aid the deaf to hear the person your speaking too only.
as compared to an ocean of sound i have to pick out a voice and try to do this very taxing task with my mind while lip reading the person at teh same time. conversations are mentally taxing and drain you very much so.
On the negative side alexis(aka wiretap) what did you tell the government about me today?
~N~
I believe hearing aids are no so great in crowded, noisy environments like parties. So this could be really useful for them... except for hearing-aid users now also needing some cameras attached too!
HAL-9000 watching David Bowman and Frank Poole plotting to take him out.
Didn't end well for them.
Won't end well for us.