Statistics For Data Entry: The Brave New Step
A reader writes:"First there was Dasher, a novel application of statistical theory that lets free texts be written using only a pointing device. Dasher works by predicting the continuations of the text being written, based on what has been written so far; there is a probability associated with each offered continuation and the presentation is designed to make it easier to choose more probable continuations. A big advantage of statistics-based interfaces is that they automatically enforce correctness, because correct strings are more probable than incorrect ones. Now the same approach has been extended to writing maths. Apropos is a Javascript application (it supports IE6 & Firefox) to create mathematical expressions. It represents the math using MathML, the official XML spec for mathematics. It is definitely clunky when compared to Dasher, but better than MS Equation Editor etc. It is interesting to consider if this approach can be extended to other XML vocabularies (for example, a model for HTML that suggests the markup as you go along - a properly trained one will make it harder to create pages with blinking text, loads of images etc.), or to formal languages other than XML (e.g. programming languages). Stochastic modeling can also be used as a basis for speech recognition, with the recognizer using the model to choose a continuation when the speech signal is ambiguous or indistinct."
It seems to be the same concept as t9.
Check populicio.us
That is hardly news. Mobile phone interfaces have been offering this kind of interfaces for years. True, they are useful, but nothing new here
...That this guy will GPL this software rather than start up a private company.
Then maybe I'd get in in the next version of fedora.
I'm so sick of *Tex.
*sigh*
May the Maths Be with you!
As other posters have noted, this sounds a lot like T9, which is used in cell phones for predictive text entry. T9 is a great utility, but it has happened that what I am writing is less predictable or the there is a more often used combination of letters that results from the keys I have hit. If I don't pay attention, I get the wrong word.
I can't help but think of someone entering a mathematical equation and concentrating more on his idea than what is being written to the screen. Due to this inattention, the equation doesn't work, he figures he's just wrong, and spends hours/days to find the point at which the computer put in its prediction and not what he thought he entered. Worst case, he could abandon what would have been a great idea.
Or, imagine this applied to writing computer programs. Say for example, you are writing a program to calculate the correct distance the probe should hold above the atmosphere so it doesn't burn up. Your cube mate distracts you briefly, and...
Bureaucracy loves company.
from the dasher site http://www.inference.phy.cam.ac.uk/djw30/dasher/ :
With version 3, as with version 1.6, every language requires a text file full of natural writing (about 300K or more); a specification of the alphabet of the language is also required.
It wouldn't be hard at all to make it work for English, as opposed to Americanese, all you have to do is train it on text written with your own preferred idiosyncrasies
The reason predictive interfaces work is that most encodings have some degree of redundancy in them. English text is about 50% redundant information, in an information-theoretic sense, and anything based on XML is going to be more so.
To see this for yourself, pick a nice big hunk of English text and gzip it. You'll get about 50-60% compression. Now, pick a similar-sized hunk of XML and gzip it - you'll probably get 75% compression or more.
Tools like this make using bloated, redundant encodings more tolerable by automating some of the redundancy away. It's not clear to me that this is a good thing.
To a Lisp hacker, XML is S-expressions in drag.
Claude Shannon, the father of information theory, used the idea referenced here in his famous 1950 experiment to calculate the entropy of the English language. See "Shannon Game" at, for example, http://www.math.ucsd.edu/~crypto/java/ENTROPY/ There's also an entire field, often referred to as "Natural Language Processing," which uses empirical observations of large amounts of language data (text or speech) to construct statistical models which do speech recognition, language translation, text summarization, spelling correction (and, yes, people at Microsoft Research have worked on this), etc. Finally, Hemos writes "Stochastic modeling can also be used as a basis for speech recognition, with the recognizer using the model to choose a continuation when the speech signal is ambiguous or indistinct." FYI, speech signal is _always_ ambiguous, from the perspective of a machine trying to transcribe it to text. I very much doubt there's been any successful speech recognition work in the last 15 years on a non-statistical system.