Voice Authentication for Classrooms?
USSJoin asks: "I am teaching a summer camp for gifted/talented children this summer, and one of my courses is an introduction to forensic science. One idea I had was to demonstrate voice printing and voice authentication. Using the magic Google, I was able to find software to get a visual representation of a voice print, but I didn't find anything that would allow me to demonstrate voice authentication. Ideally, I would like to be able to have students record their voices onto a cassette player, then speak into the computer, then try to fake out the computer using the tape recording. Does Slashdot have any ideas on how to demo this to brilliant young kids?"
This is probably not suitable for your purposes, but it may be interesting anyway: on Mac OS 9, it was possible to use a voiceprint to login. You'd repeat the same phrase four times, and then at login you would be asked to repeat it. The computer did show your voiceprint as you spoke.
I remember it being fairly good for a while, but having to re-record my passphrase as my pronunciation changed over a couple months or so. Nonetheless, it was popular with myself and my family simple because it was so freakin' cool to login via your voice.
I think that you need to be very careful what you tell those kids. Most of what you see on TV about voice identification is nonsense. The images that they call "voiceprints" are spectrograms: that is, they're 3D plots of the spectrum over time, with frequency on the y axis, time on the x axis, and energy represented by darkness. Phoneticians like myself use them all the time.
In one sense every utterance, and therefore every spectrogram, is unique. The central problem in acoustic phonetics is the enormous variation in the physical signal for what in linguistic terms is the "same" utterance. The details of the signal depend on the speaker, the speaker's mood and state of health, the weather, rate of speech, choice of register (formal, casual, etc.), as well as on what other sounds the speaker is producing in the vicinity. There is a lot of contextual influence. If you compare, for example, the vowel /u/ in "tune" with that in "moose", you'll find a large difference. This one is so large you can see it just looking at the spectrogram.
Once spectrograms became available, in the late 1940s (using a machine called the sonagraph with analog filters), people started looking for the acoustic correlates of linguistic features. They thought that it would be simple. What they discovered was the tremendous amount of variation and the great difficulty of finding acoustic correlates of linguistic features that are invariant under changes in phonetic context and the various other factors I mentioned.
One result of this is that almost all of the research has been on abstracting away sources of variation such as speaker identity. As a result, not very much is known about the properties of the voice that are unique to individual speakers. In fact, we do not know whether voices are unique. It's clear, of course, that to some extent we can distinguish people by their voices, but we don't know that voices are truly unique, or how close they are to it.
The upshot of this is that there is no scientific basis for determining whether two recordings, or two "voiceprints", are of the same speaker. (If they're different enough we may be able to say that they are NOT from the same speaker.) Anybody who claims to be able to look at a couple of spectrograms and testify with confidence that the same person produced both utterances is a quack. I know people who've spent substantial time debunking this stuff in court. You won't find it supported by published research.
So, why can you login to your computer by voice? Systems like that rely on statistical "ignorance modelling". We don't know very much about what the relevant acoustic properties are, but we can make statistical models that are good enough at distinguishing one speaker from another for some applications. Even the better speaker identification systems don't work too well if they can't make a comparison between two instances of the same utterance, and as another poster mentioned from his own experience, changes in his own voice over a few months would throw off his voice login system.
The other relevant factor here is that for some purposes its okay to have systems that make a lot of mistakes as long as they are in the right direction. If you want to limit access to a lab, let's say, it will very likely be okay to have a system that produces a lot of false negatives, that is, that incorrectly denies that the person trying to enter is authorized to. So long as you have a very low rate of false positives, the system may be acceptable.
So, the real situation is that for some applications statistical voice recognition works well enough, but that such systems do not work well enough to be acceptable for such purposes as identifying a unique individual as a criminal. Speaker identification by visual comparison of spectrograms is junk science.
As for software for looking at speech, there are a number of free (as in beer and as in speech) programs available. This page has some links that you might find useful.