TuVox Voice Interface
pablos writes: "NYTimes has an article about Tuvox who set up Handspring and Activision with voice interfaces for tech support. Apparently they can do away with the annoying 'press # now' menus. I've used things like TellMe, which played an ad everytime it didn't understand you, but I'm wondering if this sort of thing is starting to work anywhere. Anybody called Handspring for tech support lately?"
I wonder....
Boffoonery - downloadable Comedy Benefit for Bletchley Park
AT&T has a similar service for their "easy reach 800" customers, you can speak your 5 digit combination, or opt to speak to a representative, all without the keypad. Pretty basic, but it's been around for at least 4 or 5 years.
__________________________________________
Take comfort in your ignorance.
Grandmaster Plague
I noticed starting about two months ago that whenever I called the main number for AT&T Broadband, I would get the message:
"For digital cable, press or say 1" etc.
A lot of times to avoid complicated and looping voicemail, I just don't press anything to fake like I have a rotary phone and get transferred to the first available agent.
Well, that trick is no more! Since even rotary phone users can say their choices, not doing or saying anything disconnects you. Pretty crafty.
- JoeShmoe
.
-- I wonder which will go down in history as the bigger failure: the War on Drugs or the War on Filesharing
http://archives.nytimes.com/auth/login?URI=http:// www.nytimes.com/2002/02/11/technology/ebusiness/11 VOX.html
------
Random, useless fact: I type in startx entirely with my left hand.
"Thank you for calling 999, which service do you require?"
FIRE!!
"[pause]Your request has been passed on. In order to optimise future use of this service, please repeat the following list of words in a steady voice: cat, dog, bar, sky, foo..."
>I'm wondering if this sort of thing is starting to work anywhere
Voice recognition works great in real world applications. Directory assistance in the city I live uses voice recognition to find out what language you speak, the city for which you want a listing, and it can even do voice recognition on common businesses. (No doubt for a fee) All without any operator intervention. It's pretty cool.
Last year the company I work for got into a project that used a Cisco 2600 with VOIP module and a product from IBM that allowed you to interact with a website via the phone...we used PHP to create the VoiceXML documents to drive the voice menus and we were scraping the data from a local site that had weather and traffic info on it...worked pretty well considering that it was also done in German :)
;)
I would have to agree that the technology is getting closer to replacing human beings...maybe I should go check my retirement plan now
The train company in Sweden has one of these systems. It's always amusing listening to my other half battle with it when she wants to buy a ticket:
:-)
OtherHalf (in very clear voice): Stockholm
Computer : click, click,... Kiruna!
OtherHalf : Stockholm!
Computer : click, click,... Moscow!
OtherHalf : Stockholm!
Computer : click, click,... Alpha Centauri!
etc...
To be fair, it does eventually work, it just takes a while. It probably also takes less total time than the alternative (short conversation with a human, but a long wait to get to talk to them).
The best thing about them was a recent radio program. They had done some reseach to find out what words sound (to the system) like destinations. During the show they'd phone SJ up and say things like "I want to go to FsckingBastardVille", to which the computer would reply "Northern or central Stockholm?" and other such amusements.
Hours of fun
Tales from behind the Lagom Curtain
Bender: Listen, buddy, I'm in a hurry here. Let's try for a twofer. Hehe. Suicide Booth: Please select mode of death. Quick and painless or slow and horrible. Fry: Yeah, I'd like to place a collect call? Suicide Booth: You have selected slow and horrible. Bender: Great choice!
We inquired with TuVox how much it would cost to set up a solution for our level 1 help desks. The cost was mind-boggling. So, we trained one monkey for each group of tech to answer the phone and AUTOMATICALLY READ FROM A SCRIPT!!! Can you imagine? A revolution in help desk support. The script includes such high-tech TTS sounding shit such as "Press 2 for Customer Service". Then, in our mind blowing second step - we trained the monkey to pick out DTMF tones BY EAR ALONE!!! So our customers hit 2, and the monkey transfers then to customer services. Truly the wave of the future.
I'm enough of a realist to understand that the evolution of swapping jobs with technology is unstoppable but still: With the current recession, that's not really a thing to be looking forward to.
<Sig>The good thing about having a good memory is ... euh
I have been looking at a product called InterVoice Brite that appears to have a similar function. Not only do they have the software available for use inhouse, but also an ASP offering. From listening to their sample sound files, they are way ahead of a lot of the basic "say or press one" implementations I have seen.
..exactly new, is it?
:)
It's been a while since there was really much media hype about voice recognition technologies. Sure, the whole voice activated menu's "1, 2 etc." has been around for quite a few years, but I suppose there is a huge difference between repeating a few numbers than describing technical problems. I mean, is this literally a flowchart menu with various diagnostic paths or does it actually try and understand a sentance? If it's the former, then that is nothing more advanced than what is currently available and probably in use elsewhere.
I wonder what would be more frustrating, repeating yourself twenty times to a computer to battle through a menu, or sitting for twenty minutes trying to explain your problem to a ex k-mart 1st line support engineer. The choice is yours
"Never let the truth get in the way of a good story..."
Airlines use voice recognition for flight reservations and confirmations (something like this was actually one of the DARPA benchmark tasks). It works reasonably well. The long distance companies are using it as well.
For more than a year now there has been a (beta-phase) phone-number where a voice recognition program tells you the best available train-connection between two cities, at a given time.
It's nice to realize that they've made an attempt to recognize polite customers: words like "please" are ignored.
For people interested in seeing how far NLSR (Natural Language Speech Recoginition) can be pushed for specific applications go and look at VeCommerce and their demo clips. The betting system I helped build can take betting sentences of over 100 words with 96% accuracy. (Data from a live system with 1200 lines)
Customers HATE DTMF based systems, this sort of thing is the way of the future.
...keeping the customer from costing you any money.
CRM is *expensive*. Forrester Research did a study a while back on the average cost of handling customer calls by various means:
Telephone: $33.00/incident
Email: $9.99/incident
Chat: $7.80/incident
Message Boards: $4.57/incident
Knowledge Base: $1.17/incident
The technology of this article shifts a call from the top to the bottom of this list. They admit that the advance is not in AI or voice tech, but in making the experience "resemble a conversation". So at its best, this will still let grandma have *some* access to the information she could have had before from a live human. At its worst, it's a puppet show to distract us from the fact that we're not getting very good service.
I went to Seattle a few years ago, but my bags didnt. Outside of the Beast being based next door to Seattle, it is a wonderful city. I called the airline (United) and was asked to 'press or say' whichever number was to get an update for lost luggage.
It then asked me to speak the destination city and the departure city, then asked for the claim number I got when I reported the bags and it would let me know that theyd still not found my luggage.
This was 2 or 3 years ago and it worked pretty flawlessly, and I'm pretty sure the technology has come along since then too. There were times I had to repeat myself, but that's better than sitting on hold forever just to be told by the person on the other end who's day, in their minds, is worse than yours that you should stop worrying about it and get on with your life.
They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.-Franklin
I got one of those e-mails a year ago that offered free magazines and decided what the hell! 2 free magazines for a whole year, not bad, just need to call a certain 800 # to cancel them or they'll bill your credit card. So I called a few weeks ago. Lo and behold it was totally voice automated! Took my 12+ digit ID with no errors. Recognized my saying the full name of a magazine for the correct abbreviation on their list of publications. Would repeat anything it just said by simply saying "repeat". Understood "yes", "no", and "correct". Actually sounded decent, not at all like those services that have pre-recorded phrases and it has to fill in certain blanks. This sounded natural! And it all worked on the first try. Two magazines canceled and the only buttons to push were the 800#. I was impressed. Werelock
We've got a system on the Odeon cinemas ticket booking line in the UK. First, it asks you which cinema you would like to book the ticket at: Computer: Which cinema would you like to book tickets at? You: Kensington Computer: You chose Kensington. Say yes if this is correct. You: Yes Computer: Which cinema would you like to book tickets at? Please speak clearly. You: Kensington Computer: You chose Kensington. Say yes if this is correct. You: YES Computer: Which cinema would you like to book tickets at? Please speak clearly, or hold for an operator. You: Kensington Computer: You chose Kensington. Say yes if this is correct. You: FUCK OFF Computer: Kensington is correct. It can recognise hundreds of cinema names, but always has a difficulty with yes.. When it voice recognition first came out on voicemail boxes, we'd derive great amusement from saying random stuff into the phone and seeing what number it would guess...
"Welcome to the Odeon Film Line! To pick the cinema you want just say the name!"
To which you do and, in my experience, its got it right every single time. Including stuff like "Odeon Leicester Square", "Mezzanine", "Wimbledon" and "Manchester".
From what I understand they use software by Vocalis.
Avantslash - View Slashdot cleanly on your mobile phone.
duane
(Note, I still don't like them. The package I was complaining about had been left in a puddle near my garage, and the guy wrote "delivered at front door" on the slip.)
www.HearMySoulSpeak.com
A friend of mine (from Australia) went to the US a year or two ago, and found himself needing to call a service which used such a system. When he did, he found that it could not understand his accent; after three unsuccessful attempts at doing an "American" accent, he gave up.
The moral of this story: make sure that there's a touch-tone menu to fall back on.
By contrast, I once called SprintPCS and ended up on a similar system, but it was terrible. The VR was flakey, and it did not degrade gracefully when it didn't understand, leaving me disoriented. I confirmed from a friend at TellMe that SprintPCS used someone else.
I don't know anything about Tuvox, but I question whether they will have success against TellMe, which not only has good tech, but is very well backed. If they're betting on their "AI", they're probably dead as soon as people find out it sucks. If they're just trying to be a better TellMe, they have a challenge--but I hope they come out with a competing public service to get publicity!
The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
I've been using Sprint's voice dial service since it came out and its pretty effective. I occasionally have to repeat myself, but I've uploaded my whole address book and it is very good at figuring out names.
:*
:Ready
:Call John Smith at Home
:Calling John Smith at Home Correct
:Yes
For those of you unfamiliar with the system it works like this.
User
SPCS
User
SPCS
User
Done
Its super convient when you are in the car or running through an airport and don't have the time to look down at the phone. The reason I'm impressed with it is because you don't have to "train it" to your voice.
Well, the other day when I accidently baked my Visor, I had to call their support line....
~.Evanrude
I've used TellMe's service quite a lot in the past. Driving directions, Movie listings, and just generally wasting time on the phone. It is a great service. I even played around with VXML, where I came up against the greatest current limitation with non-speach-to-text voice recognition systems:
They seem to be pretty much exclusively based on grammar files. Basically, you write out a grammar that lists all the possible things you think the person speaking would utter and then match them up to different branches in your system. Unfortunately, you can't easily take free form speach and store it as anything other than a sound file. This makes it difficult to do something such as allow the user to speak a message to send as an e-mail. The VXML engines have a great deal of heuristics to handle differences in speach style and tone, but without the grammar, you pretty much need to go through voice profile training to get decent results.
If anyone knows of kewl advances in this particular area, I'd love to hear them!
All I wanted was a rock to wind a piece of string around, and I ended up with the biggest ball of twine in Minnesota
SpeechWorks' OpenVXI, originally promoted as an open source VXML interpreter, has turned out not to be a good one. Speechworks developers maintain the code, and refuse to incorporate the patches and requests of the open source community, in favor of keeping OpenVXI tied to Speechworks products. The codebase could be forked, but it's really not worth investing the effort in such a brittle product tied to proprietary solutions.
Bayonne, the GNU telephony server, is great and getting better all the time. It currently supports a strong scripting language for DTMF applications, and Bayonne's XML plugin structure and built-in support for multiple telephony cards makes it the logical choice for open source VXML.
All that's needed at this point is to finish integrating Bayonne with an open source Text-To-Speech engine (most-likely candidates are Flite or Festival), Automatic Speech Recognition engine (in this case, Sphinx) and write the XML plugin. But there is a shortage of coders with the skill and time to do this.
I really think small business and the average Slashdotter could benefit from an open source VXML solution. Small businesses could create professional telephony apps that could make them much more competitive (from accepting credit cards securely over the phone to providing dedicated 24-hr support numbers for their products), while creative coders could use it for everything from Eliza-style chatbot answering machines to having your boxen call you up and describe a hack attempt as it's being made.
I'd love to see a VXML enabled Bayonne blow TellMe and others out of the water. If you're intrigued and you'd like to get involved, check out Bayonne's Sourceforge site and sign up for the mailing list.
He who refuses to do arithmetic is doomed to talk nonsense.
I do not remember if it was calling the phone company, or my car insurance or the local ticket master, but I've had the answering machine ask me to tell it what I wanted. (tickets, sales, I don't remember now)... I replied, thinking that it boded poorly for me actually getting where I wanted to go. Surely enough though it worked great. I've also a tendancy to mumble my words, but it worked fine.
I work for Empirix.com programming test systems that test voice recognition/response.
r ig ht.htm?page=vpi_home&link=hear-right_ad
To write these tests you often recored what the user would say to the system "1017" for flight 1017 then play it back at the correct time in the menu. We like to get our customer to recored the message so there is no question about how it sounds or is said. But sometimes we recored the system itself. Often the system has trouble even understanding its own voice.
It is also amazing how fast people who test these systems manually learn to speak so that the system understands them. Automated testing using recorded prompts makes a difference.
We collect some of the prompts we get back from systems
http://performance.empirix.com/VoiceIndex/hear-
just for those of us who dont like to give out or info to read an article
the article without registration
Hello everyone. I'm Ashok, the CTO and cofounder of TuVox - the one with the Frankenstein green skin in the New York times article ;-)
It's really great to see fellow slashdotters interested in our technology.
Some comments/thoughts/observations to offer. I'll try and add notes over the course of today.
We provide automated technical support using speech recognition as an underlying modality. Speech-based technical support is a very different kind of problem than a more conventional speech application.
Most speech applications are "few turns" and low ambiguity. It only takes a few interactions with the system to get a train schedule, or a stock quote and there is little ambiguity - you either want to go to City A, or City B. Companies that provide few-turn, low-ambiguity applications spend literally tens of thousands of dollars (or even hundreds of thousands of dollars), getting each turn to be as accurate as possible. The content of such an application rarely changes, and if it does, the rollout/testing period can be very long. A final note - these callers generally use the system frequently (ie. calling for a stock quote). Because of this, callers are willing to be educated on the commands/VUI to drive the system.
We, on the other hand, have to deal with long conversations (10 minutes), with users fumbling with their equipment, confused and angry, etc. You can imagine a call System: "How can we help" - Caller - "My $#@% machine doesnt work".... We have to get the caller to their answer, in spite of the fact that they don't know what the answer is. Additionally this is probably the first time the user has ever used the system. Finally, we have to make literally thousands of answers available in a conversational style, the day the product ships. That's when the highest call volume occurs (in the few weeks after the product ships). Oh...by the way - we use real humans for the voices, not text-to-speech. That makes the production schedule even more interesting!
Callers can leave us messages about their experience. It's really heartwarming (in the words of one of our customers) to hear what callers say - we got a call a few nights ago (at 1 in the morning) where a caller said he was glad that he was able to solve his battery recharging problem, because he thought he was going to lose all his data. The part of the New York Times article talking about people saying thank you occurs very frequently. They say thank you in so many places we have started to put thank you responses into the system.
Callers dont' have to wait. Callers get answers at any time. Callers dont get rude, untrained, agents abusing them (everyone at TuVox has had a horrible experience with an ISP tech support agent)! Our customers like that proposition!
Last point - It's not a choice between a live agent and automated support. We're offering the alternative to no agent at all. People think we're replacing agents. We're not. Our technology is designed to work with and support a tech support agent. Right now, our initial rollouts with customers are after hours because that's where the call volume is lowest and where we can fix any unforeseen problems. But there's even more interesting technologies in our pipeline.
Kindest Regards,
Ashok
(Full disclosure: I have worked with most of these companies).
Telephony-based voice-recognition is going to be the Next Great Thing (tm). The main companies that are involved in this stuff are SpeechWorks, Nuance (both work on the main speech recognition/software stuff), HeyAnita (which works with Sprint), and TellMe.
She sat at the window watching the evening invade the avenue.
Sorry, but there is no substitute for a human at the other end of the line. Charge more if you have to, but answer the phone!
sulli
RTFJ.