Slashdot Mirror


IBM Develops Technology To Talk To Web

ProgramErgoSum writes to tell us that IBM's Indian-based research arm is trying to bring a new dimension to web interaction through voice interaction on your mobile phone. Developing a new protocol, Hyperspeech Transfer Protocol (HSTP), the hope is to allow users to talk to the web and get a response. Without more explanation I'm hoping this goes about as far as the gopher web. "The spoken web is a network of voice sites or interconnected voice and the response the company got in some pilot projects in Andhra Pradesh and Gujarat and the kind of innovations that people came up with were just mind-boggling, Gupta said. "

7 of 83 comments (clear)

  1. Achilles says "No." by girlintraining · · Score: 3, Interesting

    Voice tech has an achilles heel: It's called accents. Most voice software works great for english-speaking people in the midwestern United States. But if you have an accent and have ever tried to "interact" with one of those voice mail systems that are speech-activated rather than touch-tone, the words unholy rage doesn't begin to describe the frustration of listening to a soothing voice repeatedly saying "I'm sorry, I do not understand your request" and then endlessly repeats the menus. Pressing '0', if you're wondering, will only make the system remind you that it (a) only speaks english and (b) while it can process touch tones, it won't -- because it hates you.

    And IBM wants to bring this unique hell to the web? What kind of sadists are these people? As if websites that require Flash and the horrors that server-side Java unleashed wasn't enough...

    --
    #fuckbeta #iamslashdot #dicemustdie
    1. Re:Achilles says "No." by DragonWriter · · Score: 4, Insightful

      Voice tech has an achilles heel: It's called accents. Most voice software works great for english-speaking people in the midwestern United States.

      If that's true of this software developed by IBM's Indian research arm and pilot tested in Andhra Pradesh and Gujarat, then I suspect it will also handle a lot of other English-speaking people.

      But if you have an accent

      As if English-speaking people from the midwestern United States don't.

  2. I wonder by rootnl · · Score: 5, Funny

    User: fap fap fap fap fap
    Web: Oh Yea baby!
    User: fap fap fap fap fap
    Web: Wow that's it yea!

    --

    We are the people our parents warned us about.
  3. Is this gonna be like CB radio? by Gizzmonic · · Score: 4, Funny

    Breaker breaker, good buddy! Thanks for visiting my online speakin' site! My handle is: The Delta Lady! If ya'll wanna visit my cousin Watts' site, just say "bacon." If'n'ya wanna hear a special Christmas story about varmints pullin' Santa's sleigh, say "Merry Chris'mas, ya'll!"

    --
    (-1, Raw and Uncut is the only way to read)
  4. Re:Interesting... by CarpetShark · · Score: 3, Informative

    Agreed. Especially since CSS has supported aural media (including multiple voices or generic speaker categories like "child", "male", "female" for different speakers in a story, for instance) for quite a while now.

  5. Re:Interesting... by CarpetShark · · Score: 3, Informative

    There's a good (and recent) summary of the situation here:

    http://lab.dotjay.co.uk/notes/css/aural-speech/

    If you want an open source solution, you should probably look to the firevox (as opposed to firefox etc.) community. Otherwise, Opera is probably your best bet. As far as usage goes: I think it's still pretty limited, but definitely worth considering for future projects that need (or can benefit from) such features, rather than some proprietary solution. Especially since it's a relatively small amount of extra work that can be overlaid onto existing web pages.

  6. Re:Waste of Bandwidth by CarpetShark · · Score: 3, Insightful

    When you're talking about millions of terminals vs. relatively few servers, the "dumb" terminals are cheap. Also, doing good voice recognition requires beefy hardware -- probably, ideally, DSP/GPU accelerator boards or a google-style huge cluster of commodity PCs. Finally, for blind users, but also for others, listening to even the best synthesized voice gets tiring/grating after a while. It's much nicer to listen to good speech from a professional narrator, over even a normal human speaker, much less a "good" voice synth.

    I still think it'd be better for everyone if they worked on supporting a globally usable standard that could be applied on any machine, like CSS aural media, though. TTS and voice recog is probably the future anyway, might as well start taking it seriously now.