Slashdot Mirror


IBM to Open Voice Recognition Software

phug writes "According to the NY Times, IBM is donating code that it estimates cost the company $10 million to develop. One collection of speech software for handling basic words for dates, time and locations, like cities and states, will go to the Apache Software Foundation. The company is also contributing speech-editing tools to a second open-source group, the Eclipse Foundation." There's not much information out there yet - e.g. no word on licenses etc. It is worth pointing out that the Eclipse Foundation was started by IBM.

23 of 189 comments (clear)

  1. Great news by wertarbyte · · Score: 5, Interesting

    This is great, ViaVoice has disappeared for quite a while now on linux, I hope that this will open a great variety of cool open source applications. If this will be made modular like e.g. festival, I can think of endless applications worth using it.

    --
    Life is just nature's way of keeping meat fresh.
    1. Re:Great news by Gentlewhisper · · Score: 4, Interesting

      I think we will see a lot of cool applications for this like virtual ticket sales counters/telemarketing calls (ask a question through the phone and the computer will look up an answer) as well as tech support phone centres!

      No need to outsource to India, opensource it to Linux & ViaVoice!

      Woohoo! +1 for IBM again!

    2. Re:Great news by wertarbyte · · Score: 0, Interesting

      I thought of stuff like gpsdrive or kismet, changing the mapscale gpsdrive shows while driving would be a lot easier with a microphone instead of a touchpad, you could even select another waypoint, or tell gpsdrive to guide you to the nearest McDonalds/Burgerking/CowboyNeal.

      --
      Life is just nature's way of keeping meat fresh.
  2. ViaVoice by cerberusss · · Score: 4, Interesting

    Is this ViaVoice? The linux packages have been pulled off the IBM site a year or so ago but they're still floating around.

    --
    8 of 13 people found this answer helpful. Did you?
    1. Re:ViaVoice by leinhos · · Score: 2, Interesting

      That sounds like the kind of speech recognition one would want for a command/control interface to a computer (or a "smart home"). AFAIK, the Via Voice stuff is targeted at dictation, which is more difficult. Either way, if this becomes GPL-compatible, it opens the doors to hacking and improvement!

  3. Code-by-voice by Max+Romantschuk · · Score: 5, Interesting

    Eclipse is actually a kind-of Swiss Army Chainsaw -IDE. You can make plugins for pretty much everything, so one could speculate that a voice recognition plugin would be feasible.

    I don't know about everyone else, but the concept of coding by voice does fascinate me. There are obvious issues (like eliminating having to say every single control character (if at all possible)), but with a background of RSI I think it's at least worth a shot.

    Thoughts?

    --
    .: Max Romantschuk :: http://max.romantschuk.fi/
    1. Re:Code-by-voice by Max+Romantschuk · · Score: 3, Interesting

      One more thing I forgot to mention in the parent:

      Given the fact that most languages have a rather limited vocabulary, and the fact that class libraries and defined functions/variables can be extracted from existing code software like this could make educated guesses on what you were trying to say.

      --
      .: Max Romantschuk :: http://max.romantschuk.fi/
    2. Re:Code-by-voice by PhiberOptix · · Score: 2, Interesting

      i believe that the voice recognition would not be used on the eclipse (to dictate code to the ide) directly, but inside it as a api or something, so you can implement voice recognition in the software you create with eclipse.

    3. Re:Code-by-voice by dJOEK · · Score: 3, Interesting

      Eclipse is known for it's good GUI api (or at least it's better than regular swing)

      the only way to make voice commands work is to integrate them into your GUI

      so your OK-button object does not only have a textlabel-value but also an audiolabel.

      this works both ways, one way for accessibility ('hear' what button you will click) and the other way is using your own voice to 'click' it (by saying 'Ok')

      --
      Exercise caution when modding this message up: the author acts like a jerk when his karma is excellent.
    4. Re:Code-by-voice by famebait · · Score: 2, Interesting

      But how about:

      "If-block."

      if( [condition] ){
      [body]
      }


      "Condition or."


      if( [left side] or [right side] ){


      "Right side I lessthan zero. Left side parens equals zero."


      if( ([number])==0 or i < 0 ){


      "Number invar bit-or hex three."


      if ( ( inVar | 0x3 ) = 0 or i < 0 ) {


      "Body." ...

      I might not switch, but I'm sure it could be made usable with some good design.

      -Joahcim.

      --
      sudo ergo sum
    5. Re:Code-by-voice by ortholattice · · Score: 2, Interesting

      Excellent suggestion. And think hybrid - use voice when voice is fastest. After saying "if-block", the cursor could be positioned at "[condition]" which is highlighted for replacement, either by typing at the keyboard or another voice command. Combine the best of both worlds. The best use might be to start off with a few common macros that you would ordinarily bind to function keys, and voice would allow you to use them without interrupting your normal typing flow to hunt and peck for an awkward meta key combination.

  4. Why? by Anonymous Coward · · Score: 5, Interesting

    Why is it doing this, is it because they think they can make more money with increased software sales? It also might be an advertising campaign, $10 million donation is buying a lot of free coverage.

    Corporations dont usually give a way stuff for nothing, in fact their mission by law is to maximize profit.

  5. That means one ore thing missing in linux gone? by drmancini · · Score: 5, Interesting

    When you look at GNU/Linux as a complex system and think of the things that users complain about when Linux usability is concerned, GPL'd speech recognition software is definitely one of them.

    Hooray for IBM and as Ali said in the Linux ad "don't back down"!!

    --

    Never underestimate the power of idiots in large groups
  6. Human-Centered Computing! by Milo+Fungus · · Score: 5, Interesting

    My brother (who works for IBM) recently sent me an article on USA Today about the system IBM and Honda have developed for speech-interface with a GPS-enabled navigation computer. Really cool stuff.

    For those of you who haven't read it, check out The Unfinished Revolution by Michael Dertouzos. I don't agree with all of his analysis (he was a little lacking in pragmatism on some points), but overall this book was very insightful. This book, along with Weaving the Web by Tim Berners-Lee, caused a big paradigm shift in my thinking about computer technology.

  7. Nice M$-Comment at the end by echappement · · Score: 5, Interesting

    Nice title;
    Speech code from IBM to become open source

    And even better.. the comment from Microsoft, quoted at the end of the article
    "IBM has not executed in bringing this technology to a broad market as Microsoft has."

    Beside the jokes; The article states as well that Microsoft introduced their Speech Server 2004 last March, and that 100,000 software programmers have downloaded Microsoft's free software developers' kit for building speech applications on its Windows .Net technology. What exactly is the difference in quality and approach between the package from M$ and the one here mentioned from IBM ?

    1. Re:Nice M$-Comment at the end by ungerware · · Score: 2, Interesting
      With IBM's new donation, you could build a peice of consumer hardware that plugs into a wall socket & a phone line and runs your voice applications over the phone.

      You could build 10,000 boxes and sell them around the world without any licensing fees.

      That is somewhat different from a solution developed with Microsoft Speech Server 2004.

      Afraid not. IBM is open sourcing 2 things, neither of which is their speech recognition engine. One is just a JSP library, with some tags for generating voicexml for dates, times, currency grammars, etc. The other is some tools code for eclipse. Modules for editing voicexml, ccxml, grxml, etc. A fancy XML editor. Ho hum.

      Really not much to see here.

      --

      -----
      Kvetch is Yiddish for "throw an exception" --Dr. Ron Cytron
  8. Re:Code or training? by maxwell+demon · · Score: 3, Interesting

    Maybe set up sort of a ViaVoice@Home project where every geek can help training the software?
    Actually it should be quite easy: The client reads your keyboard and the microphone, and you are supposed to speak loudly whatever you type. The training results are regularly exchanged with the central server.

    --
    The Tao of math: The numbers you can count are not the real numbers.
  9. IBM also has a grammar based system. by perky · · Score: 4, Interesting
    IBM also has (or rather had in 98,99,2000) a grammar based recognition system based on the same engine, but using compiled grammars and naturally a cut down acoustic model dependant on the contents of the grammar. There was also a toolset, supporting compiling grammars from BNF, building speech telephony applications and so forth.


    IBM Hursley labs had a name dialler 5 years ago that let you phone the computer, say the name fo the person you wanted to speak with, and get put through. They also had a system that provided weather forecasts based on the name of the city or country you said. I was pleased to name the latter "Global Weather Information System" or GWIS, pronounced Gee-whizz. Both ran on the machine under my desk. Both worked reasonably well, especially given that a lot of the acoustic models for names and places were automagically generated.

    --
    "The new wave is not value-added; it's garbage-subtracted" - Esther Dyson, Dec 1994
  10. Re:Viable by notthepainter · · Score: 3, Interesting
    Yes and no.

    I used to work for MacSpeech, we also did large vocabulary dictation systems like ViaVoice.

    Back when I was there it really wasn't viable for most people.

    However, not all people can type, this includes both the "Hands Free" market (disabilities) and the "Hands Busy" market. Surprisingly, many people also don't want to type, this includes medical and legal professionals. They have an interesting problem, they often need to generate large amounts of boilerplate text quickly. Doctors, Radiologist, Lawyers are also all pretty smart and they heavily use the macro packages to contstruct documentation systems that suit their needs exactly. As you might also imagine, VARS step in and also make these macros.

    Is it for you? Maybe not, but it is for a lot of people.

  11. Re:Sphinx by tigersha · · Score: 2, Interesting

    How does Sphinx and ViaVoice compare? I am seriously thinking of playing with these two thingies but I would like to have some kind of a opinion fro a serious user.

    Thanks.

    --
    The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
  12. Re:Sphinx by Anonymous Coward · · Score: 1, Interesting

    Sphinx is free software. ViaVoice is owned by ScanSoft (IBM sold it to them). ViaVoice is only supported on MS Windows platforms, although there is a crufty old pre-ScanSoft tarball floating around that you would be lucky to get to compile.

    Sphinx is a very good recognizer, but you'll have to build your own grammar (there are "toy" grammars available). ViaVoice makes you spend 4 hours training it, and then it will only recognize your voice.

    ScanSoft ViaVoice will run fine on a 1GHz computer or less. Sphinx, using a grammar of any reasonable size, needs a 32-note Beowulf cluster of dual-core Opterons to do real-time speech recognition. Well, that's an exaggeration, but you get the idea.

  13. VoiceXML IDE by gawi · · Score: 2, Interesting
    I believe IBM is opening Voice Toolkit for WebSphere Studio.

    It's a product based on the Eclipse patform (not a plugin, more a standalone application).

    It's a VoiceXML-oriented IDE. In a nutshell, VoiceXML is a specification that defines how to make a speech recognition (or DTMF) application for the *phone* (not the desktop) using a Web model (that is, exchanging documents over HTTP). The toolkit developped by IBM allows programmers to build call flows graphically, to edit VoiceXML and grammar documents, to manipulate pronounciation dictionnaries and to do other related tasks. I believe this is the part that they are going to give to Eclipse.

    The other piece they're going to open is "Reusable Dialog Components", a set of VoiceXML documents (or templates), grammars and code. Theses modules allow programmer to combine different components together in order to build a complete application. I think this part is going to Apache.

    Also note that:

    Currently, Voice Toolkit for WebSphere Studio is only available on Windows

    Although VoiceXML is a growing standard, many area are still uncovered by the spec. AFAIK, this toolkit is not likely to integrate nicely with run-time platforms other than IBM WebSphere Voice Server.

    This is just an IDE. You need to buy the runtime (the VoiceXML gateway). I really don't think they will open their speech recognition software (a lot more than a 10M$ investment).

    --
    All humans are mortal. Socrates is a human. Socrates is dead.
  14. Re:Either way... by Anonymous Coward · · Score: 2, Interesting

    Nah, you're in the completely wrong product division. When people hear "speech recognition", they automatically think of stuff like ViaVoice and DragonDictate. This announcement has _nothing_ to do with software for interfacing with your desktop computer.

    They're talking about their voicexml tools. They're open sourcing some tools for developing voicexml-based speech applications that run in a call center somewhere, replacing "press 1 for this, press 2 for that" with "say the name of a city and state, and I'll give you a weather forecast". Customers are generally big enterprises with big call centers, and they're trying to create the appearance of a "standard development methodology" that, in reality, only they support, so they can drive engine sales and services revenue.

    Of course, MS is trying to do the same thing. IBM's closer to the mark, because voicexml actually _is_ the predominant standard, whereas MS just made SALT up all by their lonesome and have yet to have any real deployments outside of their own companies and companies they have tremendous sway over.