Hardware Suggestions for Linux IVR?
Lester Hightower writes "I am the CTO of a vertical market application service provider, and we have a couple of applications which could benefit from an interactive voice response (IVR) system. We are an almost-all Linux shop, and most of our production systems are CGI in Perl. I would like to get some feedback and/or recommendations from the Slashdot community on what hardware and software works well, is reliable, easy to maintain, and so forth." Recommendations against hardware, that do not work well for this type of application, are also welcome.
I did this once for a non-profit research institute. We used Lucent's online text-to-speech page to produce a library of vocabulary, with a patched copy of sox to convert them to the appropriate type (gmr?). We used USR voice modems, and about 300 lines of Perl code to handle everything from pickup, to producing the menu, building phrases out of the vocabulary, reading responses, and spitting out the resulting data. It was a simple menu system for reading off meteorological data, so at least the vocabulary was fixed and controlled.
I'd say it worked pretty well, and making changes worked out okay. For our purposes, text-to-speech voices did the job and saved us from the issues of having a proper studio to produce useful sample.
Speech recognition is a pain in the ass. They almost certainly just want to use telephone tones.
Very large dictionary VR is still a little unpredictable, even when they're trained to one specific person. For limitted dictionary (say, a few dozen words), speaker independent(*) VR is an awful lot more successful. One such project was on slashdot a long time ago and has been in development since - Sphinx.
As for picking out spoken numbers, back in the days when I was figuring out how to do simple VR I trained myself to the point where I could recognise a number (between 0 and 30 or so) by only looking at a graph of it's waveform - creating a markov model for that purpose would get results as good if not better.
(* This depends, in Sphinx's case, on the quality of the language model. A language model well suited for recognising mid-west American accents would work poorly on Icelandic... for obvious reasons.)
Ian Woods