Best Device For Gesture Based Input?
jotaeleemeese writes: "A few days ago there was a discussion about gesture navigation in the Opera browser, that prompted my to buy Black & White, download Opera and get the evaluation version of Sensiva. Being a trackball user, I found gesture navigation too cubersome, I found a mouse not much better either. Then I thought a pen based device or a touchpad could be ideal for this kind of input, but before investing my hard cash buying something, I would like opinions from /.ers that have already tried something with these or other programs using gesture recognition and what the results have been."
One thing I noticed about Black and White is that if your input device has very low resolution (say your comptuer is overtaxed and only servicing interrupts every 100ms or so), then the gesture based input can be a real pain, but when I'm on a fast enough machine (with a good precision mouse) the gestures are easy to preform. The problem with slow input is that when you go around a curve, the mouse may only register at two or three points along the curve, and your software will interpolate that into a straight line between those points. If what you are trying to draw is curved, then there is a good chance the recognition software will get it wrong.
Down that path lies madness. On the other hand, the road to hell is paved with melting snowballs.
I read the internet for the articles.
I think something like the old Nintendo power glove would be great, hit a button, gesture, hit a button to confirm, all of it fingertip controlled. Either that, or a touchpad/screen. What would be more natural for gesture based control than touching the screen and making the gesture?
You say you want a revolution....
The idea is that the human brain isn't good at discerning differences between short distances, such as "Move the mouse pointer to the menu bar, click within a .5 inch box, scroll down 2.5 inches to the appropriate menu item and release", however it's quite good at producing and remembering changes in directions. So, for instance, File|Save would be "Up, Left".
With just two gestures, it's possible to represent over 48 different actions. Add a third gesture, and that number goes to 288. Their research showed that their average subject had no problem remembering four levels deep!
Gesture interfaces are especially useful as a user-interface for blind people, where it's just not possible to choose items from a menu visually.
The cool thing is that gesture-based menus have been part of the Alias|Wavefront products since 1996.
-James
I gesture at my computer constantly. It doesn't increase my productivity or improve my computing, but it does make me feel a whole lot better....
For surfing p()orn sites and playing Tomb Raider, I have found that a life-size inflatable doll makes the best gesture-based "input" device.
--
Sheesh, evil *and* a jerk. -- Jade
As a guy who plays Black & White with a Logitech iFeel mouse, I've gotta say your initial take on mice needs to be revisited. Having the mouse kick back when you do something right, wrong, powerful, whatever, that means a lot, and it helps you get used to doing things the right way.
The only drawback is that it's too tiring for day-to-day use. I usually leave the feedback turned off when surfing the web, for example, because it just beats your wrists to death as you glide over a zillion links. I've got carpal tunnel, and the buzz that it makes when jumping over hyperlinks makes my wrists feel like they've been typing for hours.
It's remarkably cheap, too - it was $45 on the shelf the last time I looked.
What's your damage, Heather?
I am currently using a Dell CPxJ for browsing with Opera. I had the regular drivers installed for the PS2 mouse...they blew for this. I downloaded the Alps drivers. This allowed me to click and drag, right click, everything could be set so I just placed my finger and drug...all's done
I think the key to anything is find something you are comfortable with, and then just make it work. Don't spend a lot of money on something you aren't going to be happy with. And when you do get it, don't half ass it!
------ This has been provided as a public service! ------
For drawing pictures freehand on a 'puter nothing beats it. Pressure sensitive and integrates with Adobe and Corel. Darker, fatter lines when you press hard, lighter thinner lines when you ease up. You can actually sketch with this thing. Has a similar feel to a soft pencil or the spongy tipped ink pens. Put a piece of soft plastic over the tablet to provide a better feel of resistance to pen strokes. Nothing rough though. Anything rough will actually give you the effect of gravestone rubbings. It transfers the grain of the paper you are using to provide resistance directly to the screen. Yes, it is that sensitive.
If voting were effective, it would be illegal by now.
I've only got experience of a mouse with gesture recognition, so I can't speak for any other device.
What I have seen is how much the 'refresh rate' of the mouse's position (temporal frequency?) affects the usability of gestures.
I've bought Black and White, and it has serious issues on Windows 2000. As in it doesn't run at all. Fantastic.
I've got a triple-boot machine (Slackware/Win98/Win2k), so I'm forced to run B&W in Windows 98 where the update rate of the mouse is pretty appalling.
Getting B&W to recognise some of the more complex gestures is a pain because the time between updates of mouse position gives the gesture considerably more 'jaggy' edges, making it look less like what you actually did with the mouse.
Windows 2000 has the refresh rate pretty high, so I'd have thought it's far easier to use gestures successfully on there.
I've not used the mouse much under Linux; my dedicated Linux box doesn't have a monitor, let alone a mouse, I just use it over ssh or X-Win32, so I don't know if the PS/2 refresh rate has been increased (or is configurable); the last I saw was that it wasn't particularly fast.
Opera's gestures are fairly simple (so far), not nearly as complex as some of B&W's gestures, so the rate isn't as critical. But, add more complex ones and you will see the difference.
It's not a new technology by any stretch of the imaginatio (emacs strokes mode anyone?) but it's very useful; even something as simple as Opera's 'back' gesture is so convenient, I wonder 'why didn't they put this in earlier!'.
Nice one Mr. Molyneux; he was always the king of games back in the good old days of Atari STs, and now something from his latest game seems to have started a bit of trend elsewhere in the software business.
- Synchronize a "frame" from the point of view of every camera. You must already know their "absolute" positions, which is relative to some zero-point. (Determined by the original location of the calibrator).
- For each pixel that a given camera sees:
- Assume that you are seeing a pixel at the nearest point that the second camera in your stereo set could also see. To draw a human comparison, bring your finger closer and closer to your eye, until with your other eye it passes the line of your nose and you can't see it anymore. This is the "closest point".
- Calculate where this point would appear in the other camera, as well as the sorrounding blocks of pixels, and see whether it matches what the other camera in the stero pair actually sees.
- If it doesn't match, assume that it must be farther than you initially assume. Repeat process.
- Repeat until you "converge"...ie, get images where many pixels in the area "line up" as calculated by the assumption that they are at absolute point x,y,z. This process actually is very similar to what your eye does if you ever notice when it's scanning for how far away something is. At first it assumes it's close, then keeps looking farther and farther away until the two images are brought together. Your brain is the only thing bringing the two images together! Your eyes are still an inch point five apart, silly.
:) In the same way, for each pixel (or rather, group of pixels large enough to identify a small area on an object), our software's "brain" converges the image for various distances until it finds a match.
- If you cannot find a match, assume that the other camera in the pair is not seeing that particular pixel, either because something near you is blocking the nearest area that the other camera is seeing, or because something near the other camera is blocking the line of sight that goes to what you're seeing, or because it's outside the line of sight of another camera entirely. This last is easiest because you don't even need to scan the pixels you know only one camera sees.
- Repeat this process for each stereo pair.
- Assemble every picture you have an absolute coordinate from (that a stereo pair can see) into a three-space.
Note that I've left out such things as massaging the image from different cameras in various ways (color, brightness, etc) to get them near, using more or less fuzzy "matches" depending on how much you might expect an object to differ at different angles, and calculating lighting sources based on the calibrator. While these are serious issues, they're really basic math stuff that's well-explored in the field of optical recognition, and it's basically a cut-and-paste of components, and, like I said, a $5,000 server can do thirty frames per second without having any graphics hardware specifically enabled for this stuff. The number of three-space "pixels" it ends up getting varies with conditions, but you can always do well enough to read standard braille that's reasonably close in proximity (1.5 feet) to a stereo pair of cameras. Needless to say, there are more useful applications to these kinds of technology than reading braille on your computer screenDeveloping a gesture recognition system. I did not mean to outline everything I did above, but it really is not involved, and a lot more viable than some people think. Anyway, the interesting thing about the three-space that you develop from the process above is that it is very easily analyzable. Not only do you have a solid "block" of where pixels are, but it's easy to tell lines that separate, for instance, individual fingers that overlap. In fact, the human brain uses more picture analysis than stereoscopic analysis, and our system is actually more precise than the human brain at finding the exact location of a point two or three feet away relative to a point near it, compared with the human brain, if you are given no color clues! When looking at a hand, therefore, we can pretty take the basic shape of a hand and (here is where we get tricky) apply a very fuzzy algorithm for fitting it to the hand that we actually see. It is "fuzzy" almost to the extent of being neural-netty (although we control it very much), since it not only needs to choose between an infinite number of ways that two hands can contort themselves, but also learn the size of individual aspects of it (which changes slightly), and their shape, and for this purpose also takes into account where the hand "used" to be in the previous frame, how fast it was moving over the previous few frames, and how likely it is to move in a certain way, with respect to speed and with respect to what positions are unnatural. All this is necessary to get 30 frames per second, because we aren't just interested in the "position" of the hand, but its important aspects (the relative bend in each joint). To test, we have another application that is ONLY given the absolute position of hands and the relative joints we are measuring, and then reconstructs the hands visually. You can therefore have all three programs running, the stereoscopic analyzer feeding the hand-position recognizer data, and the hand-position recognizer feeding the renderer data, so that your screen shows how the renderer is getting the info about where your hands are. Mostly, however you move your hands will be reflected on the screen, but if you move it very quickly and unusually you can still confuse the hand-position analyzer and get an image that's out of sync with what your hand actually is doing. This is independent of the stereoscopic anaylzer, which comes up with the correct data, which if you feed directly to the renderer you see always matches what your hand is doing, at 30 fps.
So now I've outlined how we get the position of joints, which includes quite a bit of fuzziness. But by far the most fuzziness is not in this, but in the actual "recognition" of a GESTURE. We've already gotten the first-generation information about what a gesture is by spending several hours each in front of a test server set up for it, already equipped with a popular voice command system, and agreeing to surf the web and do various other tasks the voice command system is equipped for (we didn't make that, it's just purchased off the floor somewhere) while also doing the gesture we have set up for each command. So we end up with "sample" gestures to analyze, and have already manually looked at the major indicators and drawn them up and programmed them. The way we have done the first time is very crude, however, eyeing as we have each sample ourselves, but we are now in the process of collecting second-generation information, so that when a user successfully uses a gesture and doesn't complain that it wasn't what he wanted, that particular instance of gesturing gets put into the database of gesturing instances associated with a gesture, and we are developing fuzzy logic to link these gestures more closely and reliably. The gestures make sense for the most part, such as having your right thumb open to the left with your other fingers closed, in a quick leftward motion to go back, or up and with a quick rightward motion to be right. Stopping is pushing your palm forward toward the screen, closed a window is putting your finger and thumb together and drawing your hand back, as if you're flicking the window away, and refresh is a sweeping gesture with your palm toward you, from bottom left toward top-right (only a small part of the way). The software recognizes a "gesture" because you perform it particularly fast and deliberately, so if you playing with your hands slowly, it doesn't misrecognize any of these.
Anyway I'm getting really tired of typing all this, and even though there is much, much, more, I'm just kidding. Wouldn't all this be cool though?
~
IBM has a new laptop that is awsome for gesture navigation. It is large and heavy, but it opens up with a notebook on one side, and the laptop/monitor on the other. It has both a normal laptop mouse and a pen mouse. The pen mouse can be used on the screen, or on the pad beside the laptop. It comes with a documentation program that allows you to write/draw into the software itsself =) Its _REALLY_ cool... the pen allows you to do gesture type actions just like you where writing them down!
You can customize how you want it to behave (map the screen to the tablet, or use a mouse-like interface), the pressure sensitivity thresholds, macros for the two buttons, angle behavior, and eraser behavior/sensitivity. On win and mac you can easily set these independently for different programs. Another cool feature is that you can buy multiple pens (which I find pretty comfortable,btw) and have independent settings for each one.
I'll be the first to admit it does take a while to get used to using one. But after playing around with it for a while I fell in love with it.
They are a bit costly, but well worth it. Last I heard, Wacom was selling refurbished ones at nice discounts.
--
I find the mouse is excellent for this sort of thing. However, I have a Logitech Mouseman which fits my hand perfectly and I have very high sensitivity set and acceleration turned on. A gesture for me means moving my mouse within an area no bigger than about 1/4" x 1/4". Most people have their mouse sensitivity set way too low.
The only better device would be a 3D glove since you could do 3D motions, which gives a much larger domain for your gestures to be in, probably making it both easier to remember them and less likely you'll mess them up. But don't sneeze or you may delete you root directory.
BTW, Black and White sucks. A whole 5 levels, and WAY too much wood required to do anything. If I wanted to do the same task over and over and over again for hours on end I'd get a job in a factory and get paid for it. And how do you become evil? I taught my creature to eat people, I destroy entire villages, I set people on fire, fling them into mountains, sacrifice 'em all over the place, starve them to death and I'm a GOOD God? They got some good weed down at Lionhead, uh-huh.