Statistics For Data Entry: The Brave New Step
A reader writes:"First there was Dasher, a novel application of statistical theory that lets free texts be written using only a pointing device. Dasher works by predicting the continuations of the text being written, based on what has been written so far; there is a probability associated with each offered continuation and the presentation is designed to make it easier to choose more probable continuations. A big advantage of statistics-based interfaces is that they automatically enforce correctness, because correct strings are more probable than incorrect ones. Now the same approach has been extended to writing maths. Apropos is a Javascript application (it supports IE6 & Firefox) to create mathematical expressions. It represents the math using MathML, the official XML spec for mathematics. It is definitely clunky when compared to Dasher, but better than MS Equation Editor etc. It is interesting to consider if this approach can be extended to other XML vocabularies (for example, a model for HTML that suggests the markup as you go along - a properly trained one will make it harder to create pages with blinking text, loads of images etc.), or to formal languages other than XML (e.g. programming languages). Stochastic modeling can also be used as a basis for speech recognition, with the recognizer using the model to choose a continuation when the speech signal is ambiguous or indistinct."
Having used both dasher and T9, it seems to me that t9 only takes into account the keystrokes entered for each word. It then correlates them to a dictionary. Dasher, on the other hand, is based on markov chains (yes, like those word/text generators), and thus takes into account the last [n] characters. That makes it much more accurate, and, interestingly enough, should make it particularly well-suited to editing programs in most mainstream languages, since they have a lot of noise words and frequently used sequences.
Try Corewar @ www.koth.org - rec.games.corewar
A quick check revealed that both Dasher & Apropos are open source. Apropos does not carry any license, but the website says that code is free for anyone to use and modify...
'Why should it? What if I want to create such a page? Why should someone (or something) tell me what to say, or how to say it? And who will "train" such a thing? The Government??'
To make the other (more likely) options more easily available, spend a lot of time poking around for tags with smaller targets *or* type it by hand *or* change the settings to lower the effect of prediction *or* replace the training files *or* just use the damn thing since it'lol learn, nobody's telling you to do anything, and The Government (as you call it) wouldn't bother.
Happy now?
There are over 90,000 words in the English language (based on number of entries in the American Heritage Dictionary), but nobody uses all of them. Good predictive data entry is not just a matter of waiting until you've typed "tomor" and concluding that you're going to write "tomorrow" because no other words begin that way, it's a matter of noticing when you get to "tom" that, based on your past word usage, the most likely word for you to use is "tomorrow".
You can apply the same concepts to mathematics. When the professor in your example writes "X=..." on the board, you can guess that what's coming next is either another symbol, a literal number, or a mathematical expression. If the professor indicates that it's a mathematical expression, you can then guess that it will probably be the same kind of expression that he uses most often. For example, if you're in a calculus course, you could make a good guess that the expression will be an integral or derivative, and so on.
Apropos does this all with a point and click interface, which is, as mentioned before, much better than MS Equation Editor or writing out the MathML by hand.
Ok, OpenOffice.org proved to be too large for me to really use, so I hopped over to the GIMP instead. I grabbed a copy of their source, and created a text file that appended all of the c files I could find in one directory... About 750k.
.xml file. It turns out that using their little interface for creating a language is a PITA, and just copying an existing file works pretty well. I tweaked it to change the name of the language, and point to the right training document.
I took the "English with lots of punctuation", and copied the
It needs a little work, because there's no way to tell the difference between a space and an underscore, but for the most part it works pretty well. As a fairly quick test, I'd call it a great success.
I also did the same thing using PHP. Similar results. I got a chuckle when I was able to visually see the probability of me typing _POST or _REQUEST after any $.
Pretty neat. Slower than typing, but it has some interesting possibilities.
~D
This sig has been enciphered with a one-time pad. It could say almost anything.
First, it's TeX, not Tex. Secondly, TeX goes through email, and most people who care to read it unrendered very easily, so they don't need to install any dopy software just to read teo little formulas in my e-mail. Plus, TeX math notation is fast to type, and you only need to learn a page or so from the TeX manual in order to be able to use it for math. So, how is this Dasher thing better?
You're comparing apples and aardvarks here. Dasher is an input method that tries to predict what letter you'll input next based on what you've input so far -- think of it as an improvement on those silly on-screen keyboards. TeX is a typesetting notation -- you could theoretically use Dasher to write TeX, if you wanted to do such a thing on your handheld.
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.