Slashdot Mirror


Why Hal Will Never Exist

aengblom writes "Researchers at the University of Maryland's Human-Computer Interaction Lab are suggesting what many of us have already guessed. The future of human-computer interaction won't be through speech--it will remain visual (they explain why). The Washington Post is running a story about the researchers and how they think we will get computers to do what we want. The article is a fascinating read and is joined by a great video clip (real or quicktime) of the researchers and their methods. The Post is holding an online discussion with the researchers tomorrow. Also check-out Photomesa the lab's software program that helps track images on a computer. (Throw a directory with a 1,000 high-res files at this thing and you can justify that pricey new computer you bought)."

26 of 325 comments (clear)

  1. Wrong Take by XPulga · · Score: 5, Funny
    ...how they think we will get computers to do what we want...

    What ?? I thought the current research line in HCI was getting computers to get humans to do what they [computers] want. Computers doing what humans mistell them to do is soooo 20th Century...

  2. Who wants HAL anyway? by ObviousGuy · · Score: 5, Funny

    An insane bot is not the kind of thing people would find useful.

    --
    I have been pwned because my /. password was too easy to guess.
    1. Re:Who wants HAL anyway? by ObviousGuy · · Score: 4, Funny

      Speech is a natural way of interfacing with a control system. "Illuminate"

      It's particularly bad for games. "Click both buttons on the unopened square next to the '3'"

      --
      I have been pwned because my /. password was too easy to guess.
  3. Re:All great Sci-Fi ideas come to pass eventually by Iamthefallen · · Score: 5, Funny
    All great Sci-Fi ideas come to pass eventually

    If brightly coloured spandex clothes ever become commonplace I'm quitting this planet...

    --
    Wax-Museum Fire Results In Hundreds Of New Danny DeVito Statues
  4. Meet the machines half-way... by Alea · · Score: 5, Insightful

    I've always wondered why we work so hard for full natural language interface. It's far more likely that I will learn a new language than that my computer will. Indeed, I've learned several languages to "talk" to my computer.

    Of course, these are programming languages, but I don't see why some highly structured, relatively unambiguous language couldn't be constructed to talk to computers.

    The success of the Palm Pilot can be traced, in my view, to the fact that it didn't strive for full hand-writing recognition (like, say, a Newton). Instead, it required the human to meet it half-way. You get decent accuracy/speed for a small investment in learning.

    We accept these compromises in many of our dealings with computers. I don't understand why people aren't promoting a similar compromise in voice communications?

    1. Re:Meet the machines half-way... by Tazzy531 · · Score: 3, Insightful

      The article wasn't talking about the computer's limitation in terms of recognizing speech. It was directed towards the human brain's limitation to speak and think at the same time.

      I think there are some very good applications of speech technology, but it's not going to replace the keyboard and mouse. Speech technology works best when you need to do one thing while directing the computer to do something else. Like handfree mode on cell phones. My guess is that it will find its way into cars before it reaches desktops (if it reaches desktops at all).

      --


      _______________________________
      "I'm not Conceited...I'm just a realist..."
  5. What about typing and thinking? by FleshWound · · Score: 5, Funny
    From the article:
    What that means, basically, is that it's hard to speak and think at the same time.
    With the advent of the Internet and global communications, I think it's become painfully evident that a majority of the people also have trouble typing and thinking at the same time. =)
  6. Nonsense! by Greyfox · · Score: 5, Funny
    Why, 50 years ago many people said that flying cars would never exist and now, 50 years later... um...

    Nevermind...

    Actually in the future the computer will scan your face and biological status and read your mind based on millions of tiny clues. All you'll have to do is sit there with a vague disinterested loook on your face and the computer will magically do stuff based on all those clues. Later on you won't even have to be at the computer. To write that 10,000 lines of code you need by next thursday, you'd just go out and take a walk (Is anyone buying this? No? Ok, I'll stop now...)

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  7. Single Modality? by Alea · · Score: 5, Insightful

    The article picks the weakest tasks for voice to deal with, trivial things like scrolling. Obviously no one wants to do that. But I'd love to be able to speak my Google query instead of typing it, activate some applications without clicking, and many other tasks.

    The dubious argument about interfering with memory is pretty weak, and I would love to hear a good memory expert in psychology comment on that. Even if that's strictly true, it only applies when one is interrupting some particlarly "vocal" activity, like writing or reading. There are plenty of times I'm using the computer when I'd rather speak to it than move my eyes or my hands.

    This researcher seems to have latched onto a single modality instead of considering what we use day to day to communicate with each other, a combination of many communication forms.

    I know I don't roll my eyes or gesture to ask someone to pass the salt... unless my mouth is full. :)

    1. Re:Single Modality? by _Quinn · · Score: 4, Interesting

      (Mod the parent up.)

      Aside from this, making a speech interface anyone wants to use isn't about the speech; it's about the natural-language comprehension that most people (naively?) associate with speech recognition; e.g., the Enterprise's computer. Which, you note, the crew interact with on a technical level visually.

      As for the specific example of italicizing text, natural language understanding should give rise to accurate _dictation_ systems, where the computer will insert the appropriate puncuation and emphases as you speak. If you're typing, instead, CTRL+I is your friend. :)

      -_Quinn

      --
      Reality Maintenance Group, Silver City Construction Co., Ltd.
    2. Re:Single Modality? by entrox · · Score: 3, Insightful

      I agree - I think voice interaction needs to be at a much higher level than "Scroll Down" or "Next Workspace". I'd like something like "Open XMMS, XChat, Mozilla and Emacs on workspace 1,2,3 and 4" in addition to keyboard and mouse. A combination of both would be quite cool actually, because I could choose the most appropriate interface. Typing a letter using speech recognition, but coding with the keyboard - Surfing with the mouse, but also interacting by voice like "New tabs: freshmeat, slashdot and userfriendly".

      --
      -- The plural of 'anecdote' is not 'data'.
    3. Re:Single Modality? by MoneyT · · Score: 3, Interesting

      You do of course relize that the comprehension part is the least of our worries. Try telling your computer to open a temporary file on your computer. Have you seen some of those file names? If we do go to speech commands, we're going to need to get a much better system of naming things (can't name your documents dsfk.txt anymore). As for just getting files or programs to open, Apple's speach recognition does this fairly well. Just place and alias (or the actual file) into the speakable items folder and then tell the computer to open [item name]. They even have a command to make a currently selected item speakable (places an alias in there for you). Admittedly, it isn't the best interface yet, but it's a start. And the voice passwords are just so friggen cool (OS 9 only, when do I get it for X?)

      --
      T Money
      World Domination with a plastic spoon since 1984
  8. Thinking out loud? by galaga79 · · Score: 5, Interesting

    "It turns out speaking uses auditory memory, which is in the same space as your short-term and working memory," he adds.

    What that means, basically, is that it's hard to speak and think at the same time.


    I don't know about this statement, I always find it easier to write and/or think when I am expressing my thoughts out loud. Wasn't this something we were tought in school, like it's easier to read out loud than silently? Mind you having done two years of psychology I realise there is a lot differing opinions about how the brain works, so can any psychology graduates tell me if his statement is true?

    1. Re:Thinking out loud? by Peyna · · Score: 3, Informative

      You can think much much much 'faster' than you speak, especially when you aren't talking. The whole speaking at an audible rate thing kinda hinders that. You can't think too far ahead about what you are going to say, you'll be lucky to know what your next sentence is going to be. Where as if you are thinking, you don't have to actually use the words that you would have used to speak, you just 'think' it.

      Ever see people that move their mouths when they read? They are reading at the same speed they speak, which makes me wonder if they think at that speed too. I think the really improvement will come in an input mechanism which greatly improves speed. I can type/speak at about the same rate, so one of the advantages typing has over speaking is the ease of entering commands like "move this window over there" or "open this menu and click save". Maybe they should find quicker ways to enter data using our hands and fingers instead of our mouths.

      --
      What?
  9. The real issue by 00_NOP · · Score: 3, Interesting

    Is surely whether, in the future, computers will be bothered to talk to us.

    There is no doubt that computers with greater intelligence - ie an ability to learn and adapt - than ourselves will be here, probably in the next 20 - 25 years.

    When these machines get here they may well decide that speaking is a waste of their time.

  10. I don't really agree by jilles · · Score: 5, Insightful

    The error he makes is that he projects the way people use computers today to a HAL like computer and then comes to the conclusion that that won't work because it requires too much interaction.

    He is of course right about that. However, if you add AI to the mix, the computer will be able to take initiative and have some level of understanding about what you are saying. Hal was more than just speech recognition, it was more like a very clever secretary.

    Say you need to go to some place and need a plane ticket and a hotel and directions for getting around. This is the kind of stuff you would let a secretary do for you and a good one wouldn't bother you with trivialities. You definately would not want to sit next to him/her and provide detailed directions on where to look, compare prices and so on because that is the stuff that takes time and the main reason you're delegating the work.

    An intelligent computer would have enough information given a pretty vague expression like "hey I need to there and there for conference X, book me a plane and a hotel". Assuming you've worked together for some time, it should have enough information to figure out most information (like window or aisle seats, smoking/non smoking hotel room, price range for hotels, etc.). And it can always ask for additional information either verbally or non verbally depending on where you are and what you are doing. It could actually call you on your cell phone and ask but it could also send an email or an instant message.

    IMHO we are at least decades away from building such systems all of the basic techniques needed to accomplish this are still immature (although very usefull already).

    MS is often loathed for unleashing clippy onto this world but clippy was the result of extensive research into usability and human computer interaction by MS. It was rushed to market and a genuine pain in the ass (mostly because of its lack of intelligence) but the concept of some AI program watching what you are doing and intervening and offering you usefull options is not bad.

    --

    Jilles
  11. Re:I agree by CyberDruid · · Score: 3, Interesting

    Voice interface is excellent for communication from a distance. When I'm sitting in my couch, I don't want to go all the way over to my computer to check trivial things like if I have mail, when the Simpsons is on, what I have scheduled for today, playing an mp3-album, etc, etc. I just want to tell my computer to do it from wherever I happen to be. If I ask for information, the computer can use text-to-speech to give it to me.
    I'm actually looking in to the possibility of setting up such a system for myself (mostly for hack-value, of course ;). Just need decent open source voice recognition for a few pre-defined commands. I'll probably need a way to place a few (2-3) cheap microphones in my apartment and connect them (in series?) to my computer, as well.

    --

    Opinions stated are mine and do not reflect those of the Illuminati

  12. Wrong by joss · · Score: 4, Insightful

    With all due respect to the University of Maryland's Ben Shneiderman, either he has been misreported or he's a fuckwit.

    > He's convinced our eyes will do better than our voices at helping us control the digital machinery of the 21st century.

    It's really very simple. There are two sides to HCI, computer->human, and human->computer. Now visual stuff is great for computer->human communication, but not for human->computer communication. Or to put it another way, the eye is a higher bandwidth input port than the ear, but the eye is no use for output. We cannot effectively communicate our needs to a computer by drawing pictures. Although simple, this is not understood which is why every so often some twit produces an abortive attempt at a "visual programming language". It's also why purely visual interfaces are fundamentally less powerful than command line interfaces.

    I'm not convinced visual methods always win for computer->human either. Even though our eyes are higher bandwidth than our ears, we are not used to processing purely visual information in a cummalitive way. With language the information content of the message can grow exponentially with the length of the message.

    Many people are brainwashed by that crap about a picture being worth a 1000 words. Draw me a picture of "misguided".

    --
    http://rareformnewmedia.com/
  13. Finally... by Pedrito · · Score: 5, Insightful

    Someone who really knows the future. I'm tired of all these crazy people telling us we're going to talk to computers. Finally a real seer. Maybe he can pick stocks for me too.

    Sorry, but I put no stock in this at all, and I'll tell you why (of course, that's why we all get on our soap boxes here). I can't do voice dictation at all. I suck at it. I had IBM's ViaVoice for a while and I couldn't write anything that way.

    Does that mean this guy is right? Of course not. Most people in my parents' generation can barely type, because they didn't have to growing up. Now almost every kid and young adult in the U.S. can type quite well. Why? Practice.

    My uncle used to use a dictaphone (he was a U.S. senator) to dictate all of his speeches. He had no problem. Why? Practice, of course. He had no problem thinking and talking at the same time. It's just what he was used to. He couldn't type worth a damn.

    I don't put much stock in people telling us what the future will bring. Look at all the brilliant people who were telling us that all these dot coms were the future. Poof, they're gone. Look at all the brilliant people that said we'd never cross the oceans, fly, go to the moon. Sorry, but a lot of smart people are wrong, quite often!

    This guy is dealing with people who haven't grown up doing voice dictaton and are used to typing. The human brain (and I can point to about a million studies to back this up), is quite adaptable. That's one reason why we we're here and the Neanderthal's aren't. Our brains are amazingly flexible. Our brains can sometimes re-learn to do tasks that have been lost due to damage. It's especially adaptable in young people. Get a voice interface that children can deal with, and I guarantee you that that generation of kids will grow up speaking to computers. We typists will struggle and fumble, and feel "old" for not being able to pick it up as easily as them.

    But then that's just me on my soapbox. I could be wrong, but so could this guy.

    1. Re:Finally... by aengblom · · Score: 3, Funny

      You're Uncle was a Senator. I'm quite postive he wasn't doing much thinking.

      ;-)

      --


      So close and yet so far from the world's perfect ID number
  14. Voice interfaces in movies are just for show by varn_ix · · Score: 3, Insightful

    I have always assumed it was sort of inconvenient
    to speak to your computer, and the only reason
    they do it in movies and TV-shows (ST comes
    readily to mind) is to allow the viewer to better
    follow what is going on.

    Personally, I'm waiting for the direct
    computer - brain - visual nerve interface.

  15. I don't know what to think about this article... by heideggier · · Score: 3, Interesting
    I think that the bloke is right that speech is a really bad way of communicating with computers, as they are designed today. But think that it's a bit of a leap of logic to conclude that this will always be the case.

    Case inpoint, today computers are normally designed around some kind of windows environment, a Wimp interface, where information in displayed as a metaphore, ie scoll bars, ok buttions etc etc. This is an environment that was never designed for interact beyound a mouse and a keyboard. DVD however do not follow this standard, normally being based on some kind of menu system. Clearly, the way you make something determines the way it is used.

    If speech is to be a sucess on computers then the way that people interact with the computer needs to be changed. I think a system like the console where programs arn't very powerfull on their own but due to the way that they have been linked together would work very very well.

    I long for the day when I can say, "dump down everything on slashdot and tell me if any of my post have been modded up" to read wget somesite | grep index.html | echo $whatever (please excluse this example), all you would need is somekind of AL which is able to manage the interpreation correctlly (at least most of the time).

    I think, fundamentally, computers should be designed to so what you tell them to do (how I think such a system would work) and not force you to do things in a certain way, which is what current systems do today, One should never have to learn a interface.

    I also think that this guy has limited his imagination somewhat, the main thing about hal was that he was everywhere, and that in the future, computers are everywhere. For example if you were on the loo, and just thought up a really good chess move, then you would just say, Hal queen to bishop 4, not get up, sit at a console, login a realise you've forgotten what it was you where about to do. Saying that in such a case it's easier to point to some graphic, cause you don't have to think to much, Seems kinda lame

    --
    Pianist : Some jerk whos taught themselves how to type in rhythm
  16. Re:All great Sci-Fi ideas come to pass eventually by Metrol · · Score: 3, Funny

    If brightly coloured spandex clothes ever become commonplace I'm quitting this planet...

    Man, you totally missed out on the 80's didn't you?

    --
    The line must be drawn here. This far. No further.
  17. Problems with the article by wowbagger · · Score: 4, Insightful

    First, when they talk about speech taking away from working memory - that is true IF what you are saying is different from what you are thinking. For example, as I write this I "hear" the words in my head, and then type them out - I could just as easily speak them as type them (more so - coffee's not cut in yet...) It's when what you are THINKING is different from what you are SAYING - if you are thinking "it's when what you are thinking..." and you are saying "it's when what you are thinking" that things get harder.

    Second, speech is like a command line - it is largely modeless if it is done right. That's the big attraction; that's what most of the posters here are saying: They want to be surfing/gaming/whatever, and be able to say "computer, do this" so that they don't interrupt what they are doing. In short, they want to use speech as a low bandwidth auxillary channel. When I am in my car, I would love to be able to say to my MP3 player "Neo: play Rock-Boston-all" so that I can keep my eyes and most of my attention on the road . However, that is VASTLY different than putting most of my attention on a phone conversation whilst half-assed paying attention to the car I am tailgating.

    Third, speech is a very low bandwidth output compared to other solutions: when I am typing, I have the bandwidth to change case, activate/deactivate bold (in a word processor - pity Mozilla cannot be instructed to insert a <b> on a ctrl-b) or whatever. Trying to do that with speech just wouldn't work because speech doesn't have the "out of band" channels of CTRL, SHIFT etc. Sure, you COULD try to use inflection or non-speech sounds, but then the processing gets to be even worse. (Although it would be fun to hear a Perl programmer speaking a program using Victor Borge's phonetic punctuation....)

    In short, this article makes the same mistake most articles on user interaction make - it assumes there is some uber-interface, and all other interfaces are inferior. Wrong - speech where speech works, 2D where 2D works, 3D where 3D works, haptic where haptic works, etc. I wouldn't want to drive my car with a joystick, and I wouldn't want to code with a steering wheel.

  18. Playing with Voice Recognition by Metrol · · Score: 3, Insightful

    A few years ago (actually more than that) on Windows 3.1, Microsoft came out with a voice recognition app. The basic notion of this thing was to allow your voice to control the basic environment. Some of it even kinda worked.

    This eventually got kind of annoying, and I pulled it off that system. I don't regret for a second playing with it. It taught me some valuable lessons about the arena of voice recognition.

    1. I don't want to talk to my computer. You'd have to try this for a while to see for yourself, but the process is exhausting compared to just typing and clicking on stuff.

    2. I never realized how much people tend to slur words used in context, but pronounce them properly by themselves. In the training session where this app learns your voice, I found that I say "Open File" differently when reading it than when I'm just saying it aloud.

    3. Context is critical. For a person to determine the true meaning of words there's all kinds of voice inflection, and body language that needs to be read. I'm not sure I'd want to see a computer that smart!

    Personally, I don't see a huge problem with the whole desktop metaphor interacting with a keyboard anyway. It may have a lot to do with those folks that honestly don't wish to use a computer, they just want a machine to think for them. I would think anyone who does tech support might appreciate what I mean here.

    Bottom line, the only audio I want my computer to ever deal with is music playing in the background.

    --
    The line must be drawn here. This far. No further.
  19. Voice-operated pianos, computerphobic executives by dpbsmith · · Score: 5, Insightful

    Would anyone seriously consider trying to build a voice-operated piano? Simply dictate into it the notes you want it to play... Of course not, everyone realizes the bandwidth of brain-to-fingers-to-keyboard is much higher.

    So why the "voice command" fantasy in the first place?

    When the PC revolution was just starting to take off, most people had not learned to type in high school. Typing was considered a skill for secretaries, who, of course, were poorly paid, low in social rank, and referred to as "girls."

    For many years, computer technology did not penetrate the higher corporate levels because directly handling machines was considered beneath the dignity of an executive. "I don't have time to learn to use that gear, I have people to do that for me," was the typical attitude. Execs would have their secretaries print out all their email for them, dictate replies, and have their secretaries keyboard them back in.

    This changed when the young MBA's started arriving with their computer spreadsheets.

    Most people, even wealthy people who can afford chauffeurs, drive their own cars, and most people now operate their own computers... Time to retire the whole "voice interface" concept, except for people with special needss.