Voice Authentication for Classrooms?
USSJoin asks: "I am teaching a summer camp for gifted/talented children this summer, and one of my courses is an introduction to forensic science. One idea I had was to demonstrate voice printing and voice authentication. Using the magic Google, I was able to find software to get a visual representation of a voice print, but I didn't find anything that would allow me to demonstrate voice authentication. Ideally, I would like to be able to have students record their voices onto a cassette player, then speak into the computer, then try to fake out the computer using the tape recording. Does Slashdot have any ideas on how to demo this to brilliant young kids?"
Does Slashdot have any ideas on how to demo this to brilliant young kids?
Use it to protect the computer containing their final exam. They'll understand the technology backwards and forwards and have it broken by lunchtime.
This is probably not suitable for your purposes, but it may be interesting anyway: on Mac OS 9, it was possible to use a voiceprint to login. You'd repeat the same phrase four times, and then at login you would be asked to repeat it. The computer did show your voiceprint as you spoke.
I remember it being fairly good for a while, but having to re-record my passphrase as my pronunciation changed over a couple months or so. Nonetheless, it was popular with myself and my family simple because it was so freakin' cool to login via your voice.
Put a midget in a box, and call it an advanced computer that responds with voice. If they question this, ask them if they've seen Knight Rider and KITT. Have the students play their voice recordings to the "midgetputer." He'll hear the tape recording clicks and tell them they're not authorized.
Supposedly they use voice authentication to secure "the button" on our Nukular arsenal.
I found this with google, a linux pam module to login with a spoken password. May even be doable from a livecd. http://cscience.org/~lucasvr/projects/voiceauth.ph p
Well I can tell you now, it was Volidin. Ask Lev Rubin to match the voice prints. If they say it will take a month, give them a week. He'll need Gleb Nerzhin's help, but swiftly, because he's being transferred to one of the camps.
..a voiceprint (like any biometrics) is nowhere near as secure as a strong passphrase.
Because voice is a form of biometrics, you can't change it. You are always going to say a phrase a certain way - once somebody has enough recordings of your voice, it's easy to reconstruct whatever phrase they want you to "say" - all they have to do is know what the phrase is, then the phrase itself becomes the password.
So voice authentication is, at it's best, a very weak credential check and shouldn't really be used for anything beyond the preferences of a system to which access is already granted (in other words, not to open the door, but to determine what default lighting pattern to use when you say "lights on" into a darkened room).
Plus there's the issue of false positives and false negatives - at the end of the day, you're better sticking with a strong password / passphrase.
Maybe you shouldn't teach what you don't know? How about that?
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
moran.
There, they will be able to play will all kinds of state-of-the-science gadgets and devices that can teach them things like this.
I'm sorry, but I think it's a little bit out of the "Summer camp" league.
But please do post your findings, there have to be some "beginner" resources out there for people interested in this tech.
Don't think that a small group of dedicated individuals can't change the world. It's the only thing that ever has.
This may be a tad off, but one way of doing it would be to record their voiceprint in Audacity, then on the tape recorder, record from the tape recorder to audacity, and use diff on the output files to compare...
though any kind of static or background noise will automatically show. The idea still holds though, you're looking for a significant noise signature between the voice and the tape player. Now supposing your tape recorders are really good, the computer might get confused.
Oh, and make sure you've got decent microphones, the cheapass ones won't do it, you can't get a fart to record the same way twice.
Another way is to use an oscilloscope with the mike plugged in directly, and decent oscilloscopes have some kind of memory. That's fairly easy to understand, and you could overlap recordings on the screen.
---- I am certain of only one thing : I know nothing else.
I think that you need to be very careful what you tell those kids. Most of what you see on TV about voice identification is nonsense. The images that they call "voiceprints" are spectrograms: that is, they're 3D plots of the spectrum over time, with frequency on the y axis, time on the x axis, and energy represented by darkness. Phoneticians like myself use them all the time.
In one sense every utterance, and therefore every spectrogram, is unique. The central problem in acoustic phonetics is the enormous variation in the physical signal for what in linguistic terms is the "same" utterance. The details of the signal depend on the speaker, the speaker's mood and state of health, the weather, rate of speech, choice of register (formal, casual, etc.), as well as on what other sounds the speaker is producing in the vicinity. There is a lot of contextual influence. If you compare, for example, the vowel /u/ in "tune" with that in "moose", you'll find a large difference. This one is so large you can see it just looking at the spectrogram.
Once spectrograms became available, in the late 1940s (using a machine called the sonagraph with analog filters), people started looking for the acoustic correlates of linguistic features. They thought that it would be simple. What they discovered was the tremendous amount of variation and the great difficulty of finding acoustic correlates of linguistic features that are invariant under changes in phonetic context and the various other factors I mentioned.
One result of this is that almost all of the research has been on abstracting away sources of variation such as speaker identity. As a result, not very much is known about the properties of the voice that are unique to individual speakers. In fact, we do not know whether voices are unique. It's clear, of course, that to some extent we can distinguish people by their voices, but we don't know that voices are truly unique, or how close they are to it.
The upshot of this is that there is no scientific basis for determining whether two recordings, or two "voiceprints", are of the same speaker. (If they're different enough we may be able to say that they are NOT from the same speaker.) Anybody who claims to be able to look at a couple of spectrograms and testify with confidence that the same person produced both utterances is a quack. I know people who've spent substantial time debunking this stuff in court. You won't find it supported by published research.
So, why can you login to your computer by voice? Systems like that rely on statistical "ignorance modelling". We don't know very much about what the relevant acoustic properties are, but we can make statistical models that are good enough at distinguishing one speaker from another for some applications. Even the better speaker identification systems don't work too well if they can't make a comparison between two instances of the same utterance, and as another poster mentioned from his own experience, changes in his own voice over a few months would throw off his voice login system.
The other relevant factor here is that for some purposes its okay to have systems that make a lot of mistakes as long as they are in the right direction. If you want to limit access to a lab, let's say, it will very likely be okay to have a system that produces a lot of false negatives, that is, that incorrectly denies that the person trying to enter is authorized to. So long as you have a very low rate of false positives, the system may be acceptable.
So, the real situation is that for some applications statistical voice recognition works well enough, but that such systems do not work well enough to be acceptable for such purposes as identifying a unique individual as a criminal. Speaker identification by visual comparison of spectrograms is junk science.
As for software for looking at speech, there are a number of free (as in beer and as in speech) programs available. This page has some links that you might find useful.
You can buy fingerprint readers pretty easily and they usually come with some sort of authentication software. Buy an purely optical one (I think that Digital Persona makes one) and then try to fool it with gummi fingers.
Lasers Controlled Games!
For a graduate class at Goergia Tech. Voice identification/verification is not a travial problem. I don't ever recall hearing our professor talk about "voice prints."
Most modern voice identification systems use linear predictive coded (LPC) ceptra and either hidden Markov models (HMM) to evaluate how close a a given speaker is to a known user.
Having said that, I don't think it makes a very cool demo as the result is simply a number. In the case of speaker verification this number represents the probability that the speaker is who he claims to be.
Good idea, but I don't think this is what you want.
Brett
Diff? Audio recordings are binary data. Diff only works on line-oriented textual data. Also, it's only useful with text documents that are mostly identical. Two recordings of nearly identical sounds will actually end up with VERY different data. Granted, plotting the data will look very similar, but all of the raw sample numbers will be slightly different between the 2 recordings.
To do what you suggest, you'd need to graph the 2 recordings and then use some sort of visual comparison program to determine "how different" the 2 are. There are a few different ways to graph audio data, and there's no simple method to compare large quantities of data like that. So the problem is actually a very difficult one to solve.
Software sucks. Open Source sucks less.