Slashdot Mirror


Ask Slashdot: Linux and Telephony

This one is a doosy. I've received various submissions from people who were looking for information on how to make their Linux box into an answering machine. I've also received submissions asking about Voice Synthesis and Speech-To-Text. I have to admit I haven't found much information on either while browsing on the net, so I'm turning the question over to you folks. However I wonder if there isn't a issue hidden here? Can Linux be used as an Interractive Voice Response(IVR) platform? If not, why not? First off, let's NOT forget the actual questions:

Metiu and Sri both want to know if a Linux box with a voice modem can be used as an answering machine.

Gextyr is looking for information on Voice Synthesis packages that are available for Linux.

This Clan AC Member wants to know if there are any applications or APIs for Linux that deal with Speech-To-Text or Text-To-Speech.

Lastly, there have been quite a few submissions asking whether or not Linux can be used as a demand fax server. Can it?

If Linux can be used for all of the things above, what's stopping it from performing as an IVR system? IVR systems are simply systems designed to use a telephone as the computer interface (using both touch tones and voice). IVR systems are used everywhere, from your voice mail, to ordering systems, and corporations are adopting more and more IVR systems for various tasks.

I've seen IVR implemented on DOS systems but most of these have moved to NT. What's preventing Linux from operating in this market? Are there existing IVR projects in progress, or is this another area where Linux falls behind?

17 of 153 comments (clear)

  1. ARGH! by whoop · · Score: 2

    ringconnectd will allow a simple setup, but not (that I know of, a little tweaking could allow it) the sort of dual counts you suggest.

    I used it for a long time with vgetty to act as an answering machine. If the phone rang once, it dialed up ppp. If it went for 4 or 5, I forget now, vgetty would pick up and record a message.

    My beef with vgetty was that it would not play any message to greet callers. So only family/friends knew that when it beeped (it was quite a loud beep too), just start talking. The many times I tried, it either left the phone on hook until I went back home to reset it, or would just play an empty hiss for the length of the sound file.

  2. Because voice modems suck. by heroine · · Score: 2

    The sound quality coming out of voice modems sucks. At least on my modem, using some voice modem package that was around sunsite in 1997, the modem playback was unintelligable. The modem requires some horrible variation of ulaw compression. If the sound quality on modems was usable, voice modems would make a great touch tone interface.

    To get really intelligable sound, you need some kind of dedicated, expensive, phone hardware.

  3. KVoice by Matts · · Score: 2

    KVoice handles my voice mail, although it's a bit unstable, and the pickup feature is crap (there's no automatic pickup - you have to load kvoice and click on pickup - which is impossible to do in time if you're not logged on!).

    You can do demand fax serving with HylaFax.

    Other than that, I don't know of any text->speech or speech->text projects. Unfortunately it's not something that can be done very easily for free - it requires a huge investment of time, hence why these speech->text systems were originally hugely expensive.

    Matt.

    --

    Matt. Want XML + Apache + Stylesheets? Get AxKit.
  4. Some speach apps which are out there. by tgd · · Score: 2

    I was very impressed with the Festival software. Anyone looking for speech synthesis should definately take a peek at it. Text-to-speech isn't quite as nice in it as the speech synthesis itself, but its not bad.

    Its a system-hog though. I tried to use it to read e-mails to me through my voice system (see my other posting in here about it), but I found it took several minutes per message to put the audio together... Hardly worth it. Hell, my system is so slow, even using say to generate timestamps is too slow. :)

  5. ARGH! by tgd · · Score: 2

    Yes its possible. Pretty easy to set up too, once you've got vgetty working with your voicemodem. You need a voicemodem that works with Linux and vgetty though (most voicemodems these days seem to be winmodems...)

    I shied away from dynamic DNS and just e-mail the number to my pcs phone.

    One tip -- make sure you have an activity timeout on it, so if you dial it up accidently, or you (for whatever reason) don't get the dynamic DNS to update or get the e-mail that you can still cause it to disconnect.

    Throw a secure webserver on there, and just make some simple CGI's to trigger a delay to bring the machine back off the network.

    On my system I've got an X10 automation setup too, so I can remotely turn on other systems in my apartment. (Useful if I'm a bonehead and leave a file I need at home...)

  6. Speech recognition? by cfulmer · · Score: 2

    Having worked on this for a while...

    The main problem with speech recognition over the telephone is that the digital standard currently used by the PSTN samples voice at 8khz, with each sample being 8-bits wide. As a result, the speech recognition engine just doesn't have a whole lot of data to play with -- Speech recognition algorithms typically use a lot of statistics to determine how well a given chunk of speech matches a word stored in its vocabulary. The less data in the incoming speech, the harder it is to be accurate with a match. In fact, it actually gets harder, as many cell phones use various encoders to further reduce the data rate. Add that to interference and background noise, and ASR over the phone is decidedly not easy.

    Many of the shrink-wrapped ASR applications that you see are designed to work through the microphone jack on a computer, which provides much more data than is available over the phone network. IBM, L&H and Dragon are the vendors I'm aware of.

    Now, there are various vendors out there who do ASR for phone applications. Nortel (my employer, but not my project) has one, as does VCS, Nuance Communications and several others. These, however, are not generally priced for the consumer market. In addition, many of these solutions run on Digital Signal Processors, which require additional cards....

    OSS speech rec would be a good thing, but I'm afraid that it's going to be a while before it comes to pass, just because of the complexity of the statistics and the specific knowledge required. Those reasons also mean that it'll probably be a while before a PDA has the juice for it.

    (There's the urban legend of the guy presenting ASR control of his computer at a voice conference, when a voice from the back of the room shouts "Format c: Return" and somebody else chimes in "Yes Return")

  7. Linux Telephony - some good answers by jra · · Score: 2

    There are two major sites corralling telephony projects for Linux:

    linuxtelephony.com is an omnibus site, which has seemed not to have had any updates recently, and

    opentelecom.org which, well, has. :-) These folks are supported by Natural Microsystems, who have released a bunch of their code as open source under some license or another. I mean internal switching and driver code and like that.

    On a lower level front, it's possible to use mgetty+sendfax and Gert Doering's vgetty to build answering machine type stuff and also, possibly, 2-call fax response. I'm not sure about 1 call; switching modes can be messy.

    This stuff works with the old Zyxel 1496+ modems, among others, and _maybe_ with the Rockwell voice chips, but I'm not sure; the Zyxel's ought to be, roughly, free, by now.


    Cheers,

  8. mgetty and voice mail by pimp · · Score: 2

    Ohio-state has a FAQ on using [mv]getty for voice mail.

  9. CTI for Linux by squistle · · Score: 2

    Natural Microsystems has better boards, especially for industrial applications. They have released Linux drivers as well as source to their API at http://www.opentelecom.org.

    --
    There are 10 kinds of people in the world: those who understand binary and those who don't.
  10. Linux now does phone spam? by afniv · · Score: 2

    Just what I need. I reliable system to increase the number of unsolicited calls I get every evening when I'm eating dinner.

    I wonder how long it will be before that happens? I'm not sure what systems are used now, but they can't be cheap.

    Maybe I can set up my box to call them back? Or at least filter out the unsolicited calls or maybe even have preprogrammed answers to use up their time. Now there are some ideas. :)

    ~afniv
    "Man könnte froh sein, wenn die Luft so rein wäre wie das Bier"

    --
    ~afniv
    "Man könnte froh sein, wenn die Luft so rein wäre wie das Bier"
    Richard von Weizs
  11. Reveal's Serial-and-soundcard interface by SEWilco · · Score: 2
    Reveal's VM100 Telesound ($59 list) plugs into a serial port, phone line, and sound card. It is basically just a ring detector, on/off relay, and interface between phone line and sound card. I sometimes see them at electronics sales.

    Some VM100 FreeBSD code here.

    A press mention of the VM100 in Byte

  12. Dialogic support? OK, but too late for me. by SEWilco · · Score: 2

    That's nice. Wish they had not said no two years ago when I could have used it. Too late now for that project.

  13. Yes, I'm doing this now by smart2000 · · Score: 2

    I use some source I ported over from NeXTSTEP called am. IT drives a Zyxel modem, and allows callers to either page me, or leave a message, or recieve a fax. When a fax or voice mail arrives the caller id number is sent to my pager via an email to pager gateway. I then forward the voice and fax mails to myself via email, so that I can get them and store them on my note book on the road.

    I'm also in the middle of using this technology to provide a replacement for an old VRU (Voice Response Unit) from IBM. It grabs data from an AS/400 and provides information to customers on current shipments etc.

    Very easy to write. My next project involved with this is to use ears, or something like it to convert the voice to text (and then send it to my pager)

    --
    To purchase it is not like spending money but rather it is an investment in the future in a blow against the empire
  14. CTI for Linux by sam+i+am · · Score: 2

    At PIKA we already have the API in beta. See www.pika.ca.

  15. Voice systems -- lots of proprietary hardware by sam+i+am · · Score: 2

    At PIKA we have a beta version of our API running on Linux. Supports all basic telephony and fax.
    No text to speech or voice recognition.

    For more on Linux telephony see:
    http://www.linuxtelephony.org/

  16. It's just AT commands for the most part by schwantz · · Score: 3

    There are AT commands to do all this stuff, if you want to roll your own software. You'd have to do the system side (sound, etc) yourself. Rockwell (now Conexant) supports this through the use of what they call "business audio," which uses half-duplex digital PCM audio data from your computer (over the serial port/ISA slot). They also have an analog path to and from the chip, but that would be trickier, as unless you have a speakerphone version, the mic from your PC is probably not hooked up to your modem. Here's a few Rockwell (they're the MOST comman modem chipset manufacturer) AT commands (including fax and CLID)to get you started:

    7.5 CALLER ID COMMANDS
    #CID=0 Disable Caller ID.
    #CID=1 Enable Caller ID with formatted presentation.
    #CID=2 Enable Caller ID with unformatted presentation.
    7.6 FAX CLASS 1 COMMANDS
    +FCLASS=n Service class.
    +FAE=n Data/fax auto answer
    +FRH=n Receive data with HDLC framing.
    +FRM=n Receive data.
    +FRS=n Receive silence.
    +FTH=n Transmit data with HDLC framing.
    +FTM=n Transmit data.
    +FTS=n Stop transmission and wait.
    7.7 FAX CLASS 2 COMMANDS
    +FCLASS=n Service class.
    +FAA=n Adaptive answer.
    +FAXERR Fax error value.
    +FBOR Phase C data bit order.
    +FBUF? Buffer size (read only).
    +FCFR Indicate confirmation to receive.
    +FCLASS= Service class.
    +FCON Facsimile connection response.
    +FCIG Set the polled station identification.
    +FCIG: Report the polled station identification.
    +FCR Capability to receive.
    +FCR= Capability to receive.
    +FCSI: Report the called station ID.
    +FDCC= DCE capabilities parameters.
    +FDCS: Report current session.
    +FDCS= Current session results.
    +FDIS: Report remote capabilities.
    +FDIS= Current sessions parameters.
    +FDR Begin or continue phase C receive data.
    +FDT= Data transmission.
    +FDTC: Report the polled station capabilities.
    +FET: Post page message response.
    +FET=N Transmit page punctuation.
    +FHNG Call termination with status.
    +FK Session termination.
    +FLID= Local ID string.
    +FLPL Document for polling.
    +FMDL? Identify model.
    +FMFR? Identify manufacturer.
    +FPHCTO Phase C time out.
    +FPOLL Indicates polling request.
    +FPTS: Page transfer status.
    +FPTS= Page transfer status.
    +FREV? Identify revision.
    +FSPL Enable polling
    +FTSI: Report the transmit station ID.
    7.8 VOICE COMMANDS
    #BDR Select baud rate (turn off autobaud).
    #CLS Select data, fax, or voice.
    #MDL? Identify model.
    #MFR? Identify manufacturer.
    #REV? Identify revision level.
    #TL Audio output transmit level.
    #VBQ? Query buffer size.
    #VBS Bits per sample.
    #VBT Beep tone timer.
    #VCI? Identify compression method.
    #VGT Set playback volume in the command state.
    #VLS Voice line select.
    #VRA Ringback goes away timer (originate).
    #VRN Ringback never came timer (originate).
    #VRX Voice receive mode.
    #VSD Enable silence deletion (no function, command response only).
    #VSK Buffer skid setting.
    #VSP Silence detection period (voice receive).
    #VSR Sampling rate selection.
    #VSS Silence detection tuner (voice receive).
    #VTD DTMF/tone reporting.
    #VTM Enable timing mark placement.
    #VTS Generate tone signals.
    #VTX Voice transmit mode.
    7.9 VOICEVIEW COMMANDS
    +FCLASS=n Service class
    -SVV Originate VoiceView data mode
    -SAC Accept data mode request
    -SIP Initialize VoiceView parameters
    -SIC Reset capabilities data to default setting
    -SSQ Initiate capabilities query
    -SDA Originate modem data mode
    -SFX Originate FAX data mode
    -SMT Mute telephone
    -SDS Disable switchhook status monitoring
    -SQR Capabilities query response control
    -SCD Capabilities data
    -SER? Error status (read only)
    -DTP VoiceView transmission speed
    -SSR Start sequence response control
    +FLO Flow control select
    +FPR Serial port rate control
    -SSV VoiceView data mode start sequence event
    -SFA Facsimile data node start sequence event
    -SMD Modem data mode start sequence event
    -SRA Receive ADSI response event
    -SRQ Receive capabilities query event
    -SRC: Receive capabilities information event
    -STO Talk-off event
    7.10 DSVD COMMANDS
    -SSE=1 Enable DSVD
    -SSE=0 Disable DSVD

  17. Linux IVR by tgd · · Score: 4

    Its very possible.

    I've currently got an old 486/50 DX running Linux 2.2.5 at home that handles voicemail for me using mgetty and some custom shell scripts. (Unfortunately I was never able to get get vgetty perl module working... its very old and there's almost no docs for it...)

    Its pretty slick. People calling can leave voice messages or faxes. I've got it set up so either one gets packaged up in a mime attachment to my e-mail and queued to send to me. Next time the system is online it sends them off. If they sit there more than two hours it'll dial itself up and send them and get back offline. Also archives them so I can get them through a web browser on any systems in my apartment, or I can just hit the reset switch on the front of the system (which is plugged into the parallel port) and it plays any new messages for me. The turbo light blinks when I've got new messages.

    I can also control all the X10 stuff in my apartment (mostly useful for options #1 -- turn off all the halogen lights, and #2 -- turn of coffee pot, both reducing the chances that my spacing out one morning will result in my apartment burning down) ;)

    Last thing I can do is use it to cause my network to dial up. The system handles my masquerading and internet access as well as voicemail, so when it dials up my entire network is online, then it e-mails the IP address it got to my PCS phone. Secure SLL webpage on that IP address lets me control all those devices directly (especially turning on other PCs), check my messages, or disconnect the network...

    The real limiting factor I'd see in using it as an IVR system is more limited support of multi-line voice products, and the poor documentation and difficult programming for vgetty. I'm not sure there are any options other than vgetty.

    Using vgetty in combination with packages like HylaFAX gives you easy ability to do fax-on-demand and other services like that.

    I also used a system with three 14.4k voicemodems and vgetty as a way of validating information on a system that required the user give their true phone number. User was e-mailed a code to punch in after storing their supposed phone number and that code in a database. The voice system would use caller id and compare the code they entered with the code matching that number in the database. Match? Voila! Flag is set, account is activated.

    Worked great, client never used it though. C'est la vie.