Slashdot Mirror


Ask Slashdot: Effective, Reasonably Priced Conferencing Speech-to-Text?

First time accepted submitter DeafScribe writes "Every year during the holidays, many people in the deaf community lament the annual family gathering ritual because it means they sit around bored while watching relatives jabber. This morning, I had the best one-on-one discussion with my mother in years courtesy of her iPhone and Siri; voice recognition is definitely improving. It would've been nice if conference-level speech-to-text had been available this evening for the family dinner. So how about it? Is group speech to text good enough now, and available at reasonable cost for a family dinner scenario?"

1 of 81 comments (clear)

  1. Re:There isn't any... by TWX · · Score: 4, Informative

    To put this into a car analogy, electric cars don't need to surpass ICE cars in every conceivable scenario to make one worth buying for a given individual.

    No, but to expand on your car analogy, they have to be able to meet certain minimum standards and customer requirements.

    And dropping out of analogy, the hypothetical courtroom automatic stenographer would probably have it easiest, as the rules of the court dictate that only one person may speak at a time, and most courts have individual microphones for every speaking party for acoustically recording the proceedings anyway. The same cannot be said for the dinner table.

    Even the most rudimentary system for sampling several participants would cost hundreds of dollars. A half-way accurate comparison would be the equipment needed to record a drum-set, with individual microphones for each drum, cymbal, and accessory, and a processor that monitors line-levels and individually records each input separately. Replace the function of recording each input and turn it into processing each input for discrete words, and only then are you even getting to the hard part, interpreting what the sounds actually are.

    The low-end equipment to record drums is hundreds of dollars. High end equipment to do the same thing costs thousands of dollars. Now tack on the cost of the processing side, and you're probably at tens of thousands of dollars. Just to attempt to participate in a large group conversation as opposed to small-party conversation where polite participants will probably work to simplify the flow of conversation to allow the impaired individual to participate.

    A friend of mine in a social club has a son with some form of developmental disability. I've heard that it's Aspergers, but I'm not entirely certain as many of the traits commonly associated with Aspergers don't seem to manifest with him. When he's party to our conversations we modify our conversation to accommodate him. We attempt to avoid speaking over each other or over him, and we increase the amount of time that one considers a pause by a given speaker, so that we don't interrupt him while he's talking.

    If we had a substantially hearing-impaired member, we would probably modify our conversations accordingly, slowing our speech enough that lips could be read, attempting to avoid talking over each other, and attempting to keep our faces oriented to where the individual could see those faces. Given the nature of our vocabulary in this social setting (a speculative fiction group) it would be highly unlikely that a speech-to-text system would correctly interpret any of the truly important words in the conversation anyway, so such a system would be useless.

    --
    Do not look into laser with remaining eye.