Microsoft's Acoustic Caller ID Patent
theodp writes "A new patent granted to Microsoft Tuesday for automatic identification of telephone callers based on voice characteristics
covers constructing acoustic models for telephone callers by identifying words or subject matter commonly used by callers and capturing the acoustic properties of any utterance. Not only that, it's done 'without alerting the caller during the call that the caller is being identified,' boasts Microsoft in the patent claims."
The only difference here (aside from what agencies have been doing since the 1960's) is that this analysis seems to be done in real time, rather than offline? I mean, haven't monitoring people been able to tell who is speaking based on sound synthesis since forever?
Anecdotally I feel like some companies answer the phone quicker if you talk to their automated system in an irate and condescending manner. Could just be me though :)
What's the purpose of caller ID after I've picked up the phone? I'm not going to talk to some challenge response bot if I'm someone who needs to be IDd and screened anyway.
We've upped our standards. Up yours.
that when someone calls me and says "Hi, this is John Smith," I will not be able to use that info to figure out that he's John Smith without violating Microsoft's patent? (Ditto when someone I know well says "Hi, it's me.")
"National Security is the chief cause of national insecurity." - Celine's First Law
You must have done that for longer than that, but YOU NEVER TELL the OTHER PARTY, you are doing that?
Or we have assume that long before we make the call?
Brilliant!
Quo usque tandem abutere, Nimbus, patientia nostra?
I read the patent, but I guess I don't get it. How is what Microsoft claiming to do different from existing voice recognition systems?
You have to train current voice systems so they recognize your voice pattern (or, acoustic ID) and translate it to text or action. Take that and add a system that keeps profiles for a more advanced version of caller ID. It seems like a natural evolution of the technology.
Bearded Dragon
If someone had acquired some of your personal information, and then tried to impersonate you, an automated voice recognition system could be useful by raising an alarm, or at least giving a percentage of how much their voice is like yours.
Wont this most likely violate wiretapping laws in two-party states?
Your hair look like poop, Bob! - Wanker.
I had no idea someone I might call might be able to indentify me.
The sort of processing this patent covers is something that hasn't been possible until recently, but I think, in principle, is something absolutely necessary for robust AI, and that is doing recognition simultaneously on both low level features and high level features of data and on intersections of the two.
By "high level" I mean things like word choice, language etc. By low level I imagine they mean things like the specific resonance characteristics of a voice. In voice there are intermediate levels of features too, such a the characteristics of phonemes.
The upshot of this is that just as algorithms and hardware begins to reach a level of power necessary to show intelligence, it will be impossible to do so without stepping on patents.
We will have patents on a machine not being stupid.
Comment removed based on user account deletion
Does an ear count ? Seems like human being having doing this for ages. Wait I will patent the act of refreshing oneself with ones arm whilst bent between hmmmm 0-90% - that should cover most beer drinkers, I want a tax from all pub's ......
The current version can only identify Stephen Hawking.
/. should just put an RSS feed to newly issued patents on the front page. Would cut down on the number of stories per day though.
It is my understanding that recording a telephone conversation is against the law in most states, without notifying the other parties on the line.
Thus, a practical device for this patent would most likely be illegal.
Yes, but their system will come pre-programmed with the important voice signatures.
Bill Gates calling...
Caller ID displays: God
But, if there is ever an open source implemenatation of this, it will change to the following...
Bill Gates calling...
Caller ID displays: Don't even THINK about installing Windows(TM) on this caller ID
Microsoft, Sun, Apple and General Motors announced today they've also patented a talking TV-like device known as a "telescreen" that not only shows entertaining DRM'd media, but also reminds the user when they are behaving badly, eating poorly or being potentially offensive politically to others.
Anti-Globalism
I know that by the way the article is written we're supposed to think it's an evil invasion of our privacy but honestly this sounds kind of cool.
Username taken, please choose another one.
I assume you mean "does the human brain count" as the ear doesn't identify sounds. There is a lot of research into the human brain, and how it does what it does so well, but I doubt MS's latest innovation would match the intelligence methodology of the human brain.
Remember, patents require more than an idea, otherwise every Sci-Fi movie in history that has an AI identify the main character when they use a phone would be prior art. You must also explain how it's done.
Microsoft will get the law changed. Business as usual.
... but it works as well as their speech recognition. Between this, face recognition and kill drones OBL will be found and exterminated early and often. I hope it's not me next.
The programmers put in an Easter egg, just for you. Whenever Twitter says "shit" into his cell phone, the official Microsoft transcript has "M$".
Friends don't help friends install M$ junk.
Considering that we have not much better than "not a clue" how the brain actually associates the sound you hear to memory, I am skeptical that this is how their approach works.
But "not a clue" is exactly what executives, patent lawyers and patent judges know about how software and say, mathematics, work, so how is this any different? They wrote a patent on something they don't understand and will approve it without understanding it. They might as well be patenting life - oh wait they do that too,
To patent anything, follow these steps:
1. Choose something already being done in the real world, anything really
2. describe it with maximum verbosity
3. add "on the Internet" at the end
Tada! PATENT!
The keywords being:
'without alerting the caller during the call that the caller is being identified'
Don't we have laws against doing stuff with voices without informing people first? And since when is sampling audio, and then converting part or all of the audio to a format based on, and unique to the original, not an act of recording?
Hasn't this technology been explained over and over again in big-screen depictions of the NSA's technical capabilities?
/. community should just patent 'patent trolling' and put an end to all this FUD.
Maybe someone from the
They do! Some systems are in fact setup so they just tell you to hit 1, 2, 3, etc., but if you say "operator" or something it'll take you to a person, and some if you start swearing at it it'll take you to a person too. I wouldn't have believed it but I saw one of those "here's how to get real tech support" articles, and for several companies it says to get pissed at the recording.
That's why I come to Slashdot, for the comedians.
"Do you have the box?" 5+ geek creds to anyone who also immediately thought of the same movie :-)
Remember, kids. They're the US government. They don't DO that sort of thing. But they'll try.
Please help metamoderate.
that everytime I recognise the person on the other end of the phone by recognising their "voice characteristics" I have to pay Microsoft tax?
"Hi mom! oh damn..., I mean, hi stranger whose voice I don't recognise but I am wildly guessing is probably my mother..."
Thus, a practical device for this patent would most likely be illegal.
Do you have to notify a caller that you are using caller ID? Do you have the right to make an anonymous phone call?
This guide for journalists may be helpful: "Can We Tape?" But I am not sure that any existing law is a good fit for this new tech.
I'm getting VC interest for a novel system that recognises your speech, converts it to text, and streams that through a voice synthesizer outputting in the language and accent of your choice, thereby spoofing the MS sneaky ID system, all in realtime.
Obviously the pre-alpha version accepts English input and outputs Klingon with a Judge Judy accent, but more language and accent packs will come. Version B will have gender and age variations to overcome discrimination, and the lab guys are working on species translation, this is difficult, so far we can only talk turkey. Minimum investment 1mwahaha but that excludes the domain, website and brand name: Really Awesome Speech Convertrix, Absolutely Lovely - RASCAL tm. Terms and Conditions Apply.
Ballmers gotta hire me now.
What this amounts to is the ability of MS to tell people they have to pay a royality if they identify who they are talking to upon receiving a phone call.
Ring Ring
joe: hello
Hello joe.
joe: Who is this?
You know who this is, so hows it going joe?
Joe: Who is this?
Stop fooling around Joe, Are you going to visit soon?
Joe: Who is this?
Well if you don't want to talk then good bye.
click
From the other end. My own son doesn't recognize his own mothers voice...
From Joes end: Must have been some crazy lady with MS stock....
not to forget...
how many times do you get sales calls from the same person at a telemarketing company?
Or a bill collector?
But just imagine what can be accomplished should all the identifications people make on their system be then collected up by the government spy agencies..... without your knowledge.
Of course if you are running a business where those answering the phone can vary but you want to give personalized reception of the call....
Come to think of it.... this technology was already in use around 1993 at some computer distributor in Califorina who used it to identify customer, regardless of what phone they were calling from (nixing caller ID).... As I became aware of just such an incidence and asked them about it. I was told they developed it in-house. I'm sure I could probably find the store name, as it was where I bought and Amiga Toaster 4000 from.
Inventors: Arthur C. Clarke and Stanley Kubrick
First publication: 2001 A Space Odyssey (Released 1968). Heywood Floyd checks in to the space station:
Female voice: "Thank you. You are cleared through Voiceprint Identification."
http://www.imdb.com/title/tt0062622/quotes
and what they think about this UNANNOUNCED AUDIO RECORDING and processing? I mean, they complain about the public (the people who pay their paychecks) keep tabs on them courtesy of video cameras.
NSA has had real-time voice ID since before '96 and possibly longer. How MS got this patent is beyond me. Our system is soooooooo broken
I prefer the "u" in honour as it seems to be missing these days.
According to this:
;)
Not only that, it's done 'without alerting the caller during the call that the caller is being identified,'
They are describing a means to RECORD callers without their knowledge, and hence without their consent. So would this software be illegal in some jurisdictions? You bet yer ass it would be.
Wonder how it handles people who say "uhm" or "uhh" a lot.
My Suburban burns less gasoline than your Prius.
My name is Werner Brandes, my voice is my passport. Verify me.
"How is what Microsoft claiming to do different from existing voice recognition systems?"
Existing voice recognition systems might be more acurately called speech recognition. They don't recognize the voice (who is speaking); they recognize the speech (what is being said). They can be categorized as speaker dependent or speaker independent.
Speaker dependent speech recognition (type 1) requires complex training by each user. It needs to know all the ways a person pronounces every possible phoneme. During use, it must be given the name of the speaker and a sound sample. It gives back the name of the phoneme. 2 inputs, 1 output.
Speaker independent speech recognition (type 2) is able to identify individual phonemes as spoken by a wide variety of speakers. 1 input, 1 output. That's what I would imagine is the important first step of what MS is claiming to do. Once a phoneme or two has been identified, the name of the phoneme and the captured sound sample can be fed to the type 1 algorithm and it would be able to output the name of the speaker.
Functionally it's different than existing "voice recognition" systems, but I seriously doubt it worthy of a patent.
That makes no sense. Just because you can train a system to work better at converting speach to text if it knows your voice pattern, doesn't mean that it can uniquely identify someone from the voice pattern. Those are two different things.....you can't just tell it to run the algorithm in reverse and expect there to be enough information. In fact, you aren't even running it in reverse if you don't have the text version of what they said.
N,IDNRTFA.
This issue is a bit more complicated than you think.
...it's done 'without alerting the caller during the call that the caller is being identified.
...Sometimes...when the phone rings...
...I answer it...and just listen...
...I hear the caller's voice and identify them by their voice...
...Then hang up without saying anything.
How insidious!
What. Is. The. Difference.
Do they have a working implementation? Or is this just an IP land grab?
IANAL, but I do have 2 patents.
You don't need a working implementation to have a valid patent. However, the information in the patent must be sufficient for someone of ordinary skill in the art to implement the patent.
If the information in the patent isn't detailed enough to implement the patent, it isn't a valid patent.
That's weird, I just read today a bank from Brazil adopting a voice recognition system to bill their clients.
"The software, created by VoxAge, reads some of the client's data, call him and then asks his name and other stuff. Then, depending on his answers, the client is forwarded to a call center."
I'd say both Microsoft and that software do the same thing, transform the voice into data, then analyze it with some other data previously stored in a database.
Years ago, I put up a sign in the lunch room where I worked, it said "wash your own dishes. Even if no one is looking."
Seems to me the same principle applies here... Eh, what do I know? I hardly say anything to anyone, and when I do, I say what I mean.
On the other hand, in today's world of digital recordings, cut-n-paste, out-of-context quotes, etc. I think "I never said that" should have the same legal weight as a "recording" of me saying it. After all, I can produce a "recording" saying the opposite, and yeah, that is a photo with me, Elvis, and the alien mother ship.
This issue is a bit more complicated than you think.
I have Caller ID so I know who's calling BEFORE I pick up the phone, not afterwards.
"We need to get over this notion, that, for Apple to win... Microsoft must lose." - Steve Jobs, 1997
Should I even ask? Does the 4th Amendment mean anything anymore?
Cops bust a guy for video taping them and charge him with wiretapping and Microsoft is going to be recording my voice and compiling a profile of me and that's okay?
Words I'm guessing it will be looking for by default: bomb, liberal, weed, nuke, bush, 1st Amendment.
My tinfoil hat is starting to look stylish.
Patch Tuesday gets a patent granted? w00t!
I guess the NSA will come after them with prior art :-)
Insert
I do it every goddamned time I pick up the phone.
Hello? Oh, hi Jack. Not much, just violating another idiotic patent. How about you?
"Hello, this is Bill Gates. I know who you are."
...in order to IDENTIFY you correctly after a few calls it's going to have to record how you say your words and have it in a quickly-accessible database. Otherwise what's there for it to rely upon in identifying the person, a magic pixie?
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
I know this is off topic, but I have always wondered why they do this? I once asked the operator and they said I have no number...
Is this a mystery like the missing sock in the laundry?
"You can't make a race horse of a pig"
"No," said Samuel, "but you can make very fast pig"
Is it just me, or is a caller ID more useful when you know who's calling before you pick up and speak to them?
Speaker identification has been researched for decades. Microsoft isn't offering a breakthrough solution to the problem, they are instead trying to patent the whole field.
This is roughly the equivalent of trying to patent "3D graphics acceleration" or "data compression".
Hey, if Audible Magic and fingerprint and identify a copyrighted song regardless of compression or transcoding, why not this?
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
There are existing voice recognition systems as well. Some are used to choose which speech recognition profile to use, while some are used for other applications, similar to the one in the patent.
...every time we recognize someone's voice on the phone.
These posts express my own personal views, not those of my employer
Why not call them and play any copyrighted Music-Track. They record it and you can hand 'em over to the RIAA =)
me: hello?
caller: Hello, I'm Suzi Cheatem from Dewey, Cheatem, and Howe. I was wondering if you'd like to answer a few questions about your behaviour while using the Internet.
I think hrm, this sounds like one of those annoying telemarketers
me: Sorry, I'm not interested in speaking to telemarketers
caller: It seems like you have identified me from a previously identified acoustic model. I'm afraid I'm going to have to tell Microsoft that you have stolen their idea. You can expect a bill from them within two weeks.
<hangs up>
Gosh, those telemarketers get stranger every time they call me.
Ask me about repetitive DNA
You're talking out of your ass.
This is not speech recognition it's speaker recognition, and is nothing new.
Nope. I don't think so!
You must be reading out of your ass because that's exactly the distinction I made.
Ring. Ring. Hello? Whatzzzzzzzaaaaaaaaaaaahhhhhhhhhhhhhppppp (Computer thinking) "Microsoft Windows has encountered a problem and needs to close." Hello? Helllllloooooooo?
Prior art.. definitely!! but are the NSA going to reveal "their" technology to the patent office? Or will MS's claim be quashed in the name of national security?
Particularly if you don't inform the caller at the beginning of the call. According to the Telecommunications Act, it is illegal to record a telephone conversation without the knownledge of all parties involved in the call, which is why you get prerecorded messages at the start of your calls to call centres saying "This call may be recorded, for training and security purposes" or somesuch. I would think that realtime (or near realtime) voice analaysis still requires the data to be cacked (i.e. saved/recorded) during the analysis process and so is covered under the provisions above. Otherwise anyone can can run a phone tap, effectively.
Ceci n'est pas une
You described it as if speaker recognition was something new, if only a minor advance on speech recognition. In fact speaker recognition as an independent field with a long history to itself - you just don't typically see it used commercially.
this section
Not only that, it's done 'without alerting the caller during the call that the caller is being identified,' boasts Microsoft in the patent claims.
would probably run afoul of wiretapping laws...I know that if I had money, I would probably be willing to push a test case...
of course if teh caller blocks caller id then voice recognition would be next step.
Anecdotely i can tell you about someone who a few years ago called a major department store's service center ( Sears I think) wanting to schedule some repair. However without ever talking to anyone the automated system told him there would be a wait time of xx length. He felt he didn't want to hang on the phone that long, so he hung up (never explicitly leacing any information). 20 minutes later a call comes from the service center saying they recognized he called (apparently from the caller information left implicitly) and where now available to assist him. Poor guy was totally spooked that he hung up and never called the service center again.
Now this guy is no Luddite; he was at the time in his 70s and the still a working president (and owner) of the company for which I work. His take was that it was rude and presumptious of the company to call him like that.
We'll they managed to copy humans, like all other robotic companies are thinking.
I'm sure most programmers have had this idea, I have.
...they're Microsoft, and therefore the law doesn't apply to them.
Ah, soon Drew Barrymore and friends will be pulled from their movies and put on the case. Cool!
I drank what? -- Socrates
(n/t)
This reminds me of a song by Dweezil Zappa titled "Return Of The Son Of Shoogagoogagunga" from the Confessions Album
ring... ring...
"Duuuuude! heh - check one two"
"Hey dude"
"Who's this?"
"It's me dude!"
"oh yea, Shoogagoogagunga"
That would be like asking a child porn site to not post children having sex.
Honest, I was just wondering. No, really.
Have gnu, will travel.
The only flaw found so far is that it can't identify Steve Ballmer because voice recognition software isn't able to make sense of the sound of chairs smashing into things.
Arguing about vi versus Emacs is like arguing whether it's better to make fire by rubbing sticks or banging rocks.
Once again a patent has been issued for 40 year old technology. Proof once again that if you insert the word COMPUTER in the application, you can get a patent grant for ANYTHING.
Our patent system is seriously broken, and needs fixing.
Everybody knows 3 people with my name.
#1. I have many starkly different voices
b) I don't use just one speech pattern
It all depends on who I'm talking to and my mood.
Way to kill 3 birds with 1 stone MS.
Wasting everyone's time
Raising privacy issues
AND failing to note the existence of CALLER ID.
Considering that a company must notify a consumer when they are being recorded, having tech like this out in the wild raises serious privacy concerns.
Have you checked out Google's 411 yet?
http://labs.google.com/goog411/
If you read the privacy policy: http://labs.google.com/goog411/privacy.html
It says Google stores your voice commands. I assume they could "voice print" the caller even with caller ID blocked.
I guess they won't be able to now, unless they license the patent.
In case they're a fox?
Twinstiq, game news
The thing about technology is that it can be used both ways. Maybe we [users] can set up a web site called www.whereishenow.com . Hence, we can type in a name, e.g. George Bush, and have it show us where he is right now, using input from a variety of sources, video cameras, travel documents, etc. I read an article last month about the "Outer Limits" episode called OBIT about this futuristic device used to monitor everyone on the planet. Outer Limits didn't get the details, but was very close on the idea. My suggestion is that we turn the technology on the top 2000 most influential people or Masters of the Universe (re 1950's book, "The Organizational Man"). Maybe Gates might get pissed off if instead of monitoring us, we are monitoring them.
Its called the "EAR", will deaf people have to pay for it.
ozialien
This new device took 6 million years to develop, its called the "EAR". I hear by counter claim that my ear can do far more than this technology and thus is really a subset of my innate capabilities. The inventor of this device isn't available to comment. But has asked his counter part to schedule a meeting for all eternity. Several intelligence agencies have marveled at this new technology saying it will make their jobs a lot easier than before. It used to take 300 million Chinese and Indian consultants to sift through all that voice data a spokesman said.
They probably don't, but VeSecure already do! Wonder how this patent will affect all those already using voice biometric products?