Slashdot Mirror


Statistics For Data Entry: The Brave New Step

A reader writes:"First there was Dasher, a novel application of statistical theory that lets free texts be written using only a pointing device. Dasher works by predicting the continuations of the text being written, based on what has been written so far; there is a probability associated with each offered continuation and the presentation is designed to make it easier to choose more probable continuations. A big advantage of statistics-based interfaces is that they automatically enforce correctness, because correct strings are more probable than incorrect ones. Now the same approach has been extended to writing maths. Apropos is a Javascript application (it supports IE6 & Firefox) to create mathematical expressions. It represents the math using MathML, the official XML spec for mathematics. It is definitely clunky when compared to Dasher, but better than MS Equation Editor etc. It is interesting to consider if this approach can be extended to other XML vocabularies (for example, a model for HTML that suggests the markup as you go along - a properly trained one will make it harder to create pages with blinking text, loads of images etc.), or to formal languages other than XML (e.g. programming languages). Stochastic modeling can also be used as a basis for speech recognition, with the recognizer using the model to choose a continuation when the speech signal is ambiguous or indistinct."

121 comments

  1. Like t9 by xabi · · Score: 4, Interesting

    It seems to be the same concept as t9.

    --
    Check populicio.us
    1. Re:Like t9 by xabi · · Score: 3, Insightful

      More info in the same dasher web site here

      --
      Check populicio.us
    2. Re:Like t9 by Anonymous Coward · · Score: 0

      it seems as though it's just a Markov chaining system with a topic-specific dictionary

    3. Re:Like t9 by KjetilK · · Score: 1, Offtopic
      Not really. I use T9 daily to write SMSes, and Dasher now and then for the coolness of it. Dasher is in Debian and I would guess in many other distros. Just try it out to feel the difference.

      Dasher is something I would really like to have on a PDA and even a cellphone. T9 is just a simple aid to write a couple of hundred charachters at most, but nothing that would help me writing longer texts.

      PDA-makers, hear this: You need to put a lot more effort into text-entry interfaces. Have a serious look at Dasher!

      --
      Employee of Inrupt, Project Release Manager and Community Manager for Solid
    4. Re:Like t9 by a_hofmann · · Score: 3, Interesting

      While the concept is the same, the application goes way further than t9. This is where I see such ideas bound to failure.

      t9 is a great technology because the vocabulary used writing SMS is pretty narrow. After entering the first few characters of a word, the contextual information in the dictionary is good enough (most of the times) to suggest the wanted word very fast. t9 is even able to dereive this information without the user specifying the exact characters but rather just one of the 3-4 on any mobile key.

      As said this is possible because of the small dictionary of probable words and the good contextual information for characters and their position in words.

      Extending this tech (or event better methods) to larger domains makes the problem much harder very fast, by my feeling I would even say that increase to be exponential in the mathematical sense.

      There may be a small, well defined set of possible mathematical formulaes, if you divide the things into small enough junks. Saying the same about XML documents or even native language text (beyond the character/word level) is imho foolish.

      I would like to stress that such technologies are very important and promising to lower the input barriers for disabled people, as they already work in the same sense for anyone on very limited devices (like mobile keyboards). On the other side I don't think such things to change the way most of the people already put information into their computers.

    5. Re:Like t9 by xabi · · Score: 1

      Yes, I also use T9 everyday. What I'm trying to say is about the way they work. Both methods are based on statistics, I mean, you begin type something and they try to finish the word ussing statistics.

      --
      Check populicio.us
    6. Re:Like t9 by fbjon · · Score: 1

      Japanese phones use prediction at the grammar level (at least). You begin typing a word and it then suggests a likely candidate. After this, you have a small menu of likely "next words" or grammar particles, depending on phrases, what you have written in mails/messages before and what kind of word you just wrote. I've written entire mails just by typing one character, and selecting the rest from menu after menu.
      Of course, that's still not a very long mail, but I don't see why it should be difficult to expand it. A general solution might be difficult, but why does it have to be so broad? Prediction for a specific language is what is desired anyway.

      Btw, has anyone tried coupling text prediction with speech recognition?

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
  2. Old technology by Inigo+Soto · · Score: 4, Interesting

    That is hardly news. Mobile phone interfaces have been offering this kind of interfaces for years. True, they are useful, but nothing new here

    1. Re:Old technology by Auxon · · Score: 1

      What mobile phone offered this kind of input?? If you are just talking about predictive completion, you only have half the story. RTFA - there's a lot more to it.

    2. Re:Old technology by shimmin · · Score: 1

      Heck, I remember a word processor with predictive completion in shareware catalogs ca. 1985 or so. Don't remember its name, but, *sigh* another user-interface gem delayed for years by monopolist hegemonies. ~~~~

    3. Re:Old technology by danheskett · · Score: 1

      Most often those types of situations the product fails for other reasons, not because of hegemonic monopolies..

      For example, a great word processor exisited a while back for an extant platform, and everyone bitched about it being shut down.

      It was great because of its speed and cleaness. But yet, despite that, there was not support for footnotes and/or endnotes. That instantly ruled out the most likely target customer base: the legal/business world.

      When that product wents tits to the sky, whose fault was it supposedely? Microsoft.

      Bah.

    4. Re:Old technology by mwood · · Score: 1

      Hmmm, I keep running into gadgets that want to do predictive completion for me. I always get annoyed and turn it off.

    5. Re:Old technology by stephanruby · · Score: 1
      "That is hardly news. Mobile phone interfaces have been offering this kind of interfaces for years. True, they are useful, but nothing new here"

      Wrong. The Mobile phone interface is nothing like Dasher. It's not as fluid and as usable as Dasher. Dasher is really something that you should download and actually try before you comment on it.

      And if you're not up to downloading it, at the very least you should look at its demos (available in either animated gifs or mpeg/avi/asf movies).

  3. It sounds good .. in theory by Anonymous Coward · · Score: 5, Funny

    "You appear to be writing a letter, and here's what you're probably going to say..."

    1. Re:It sounds good .. in theory by Anonymous Coward · · Score: 0

      "You seem to be trying to use tab completions. That means you're used to another OS and that this copy of windows is probably stolen. Hired goons will be at your door in 5...4...3...2...1..."

  4. OFF:The first thing that came to by A+beautiful+mind · · Score: 1

    ...my mind about apropos is the *nix program
    "NAME
    apropos - search the manual page names and descriptions"

    --
    It takes a man to suffer ignorance and smile
    Be yourself no matter what they say
    1. Re:OFF:The first thing that came to by opello · · Score: 1

      yes, now they must start the journey to a better name ... like firefox, heh

  5. Re:correctness? by Anonymous Coward · · Score: 2, Insightful

    Or the poster uses the British form of English where, I believe, this is correct usage. Not everyone is a 'Merican, you know.

  6. I Can Only Hope...... by ObsessiveMathsFreak · · Score: 2, Interesting

    ...That this guy will GPL this software rather than start up a private company.
    Then maybe I'd get in in the next version of fedora.

    I'm so sick of *Tex.

    *sigh*

    --
    May the Maths Be with you!
    1. Re:I Can Only Hope...... by Anonymous Coward · · Score: 1, Informative

      A quick check revealed that both Dasher & Apropos are open source. Apropos does not carry any license, but the website says that code is free for anyone to use and modify...

    2. Re:I Can Only Hope...... by RWerp · · Score: 1

      Dasher is an input method, not a typesetting engine.

      --
      "Long run is a misleading guide to current affairs. In the long run we are all dead." (John Maynard Keynes)
    3. Re:I Can Only Hope...... by khrtt · · Score: 1

      I'm so sick of *Tex

      First, it's TeX, not Tex. Secondly, TeX goes through email, and most people who care to read it unrendered very easily, so they don't need to install any dopy software just to read teo little formulas in my e-mail. Plus, TeX math notation is fast to type, and you only need to learn a page or so from the TeX manual in order to be able to use it for math. So, how is this Dasher thing better?

    4. Re:I Can Only Hope...... by Carnildo · · Score: 2, Informative

      First, it's TeX, not Tex. Secondly, TeX goes through email, and most people who care to read it unrendered very easily, so they don't need to install any dopy software just to read teo little formulas in my e-mail. Plus, TeX math notation is fast to type, and you only need to learn a page or so from the TeX manual in order to be able to use it for math. So, how is this Dasher thing better?

      You're comparing apples and aardvarks here. Dasher is an input method that tries to predict what letter you'll input next based on what you've input so far -- think of it as an improvement on those silly on-screen keyboards. TeX is a typesetting notation -- you could theoretically use Dasher to write TeX, if you wanted to do such a thing on your handheld.

      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
    5. Re:I Can Only Hope...... by Superfluid+Blob · · Score: 1

      If you're sick of TeX, give lout a look.

  7. I'm not hopeful by Anonymous Coward · · Score: 2, Insightful

    Dasher works because there is a small number of words that are likely to follow on from where you are. The same does not apply to MathML or HTML. The most useful you are likely to get is tab-completion for tag names, attribute names, etc.

    1. Re:I'm not hopeful by weierstrass · · Score: 2, Insightful

      What we write is only predictable to the extent that it is redundant: ie when i type "tomor" into my mobile phone, if it's obvious to the phone i'm going to write "tomorrow", i could just send a msg saying "C U tomor".

      It doesn't seem to me that there's anything like as much redundancy in mathematical formulae as there is in written language. When the professor writes "X=..." on the board, it's very hard to predict the next symbol unless you know what x is in fact equal to.

      --
      my password really is 'stinkypants'
    2. Re:I'm not hopeful by KevinKnSC · · Score: 2, Informative
      You're incorrect when we say that what we write is only as predictable as it is redundant.

      There are over 90,000 words in the English language (based on number of entries in the American Heritage Dictionary), but nobody uses all of them. Good predictive data entry is not just a matter of waiting until you've typed "tomor" and concluding that you're going to write "tomorrow" because no other words begin that way, it's a matter of noticing when you get to "tom" that, based on your past word usage, the most likely word for you to use is "tomorrow".

      You can apply the same concepts to mathematics. When the professor in your example writes "X=..." on the board, you can guess that what's coming next is either another symbol, a literal number, or a mathematical expression. If the professor indicates that it's a mathematical expression, you can then guess that it will probably be the same kind of expression that he uses most often. For example, if you're in a calculus course, you could make a good guess that the expression will be an integral or derivative, and so on.

      Apropos does this all with a point and click interface, which is, as mentioned before, much better than MS Equation Editor or writing out the MathML by hand.

    3. Re:I'm not hopeful by Anonymous Coward · · Score: 0

      You use an American dictionary as reference for the number of words in the English language when they can't even spell words right in the first place?

      The OED is the official dictionary you should be using.

    4. Re:I'm not hopeful by KevinKnSC · · Score: 1
      I was just pulling a number from the dictionary I had on my desk, not trying to make a statement about said dictionary's authority as the ultimate worldwide reference for the language.

      The OED has "over half a million words", which supports my previous argument even better. Nobody comes close to using all of those words, so predictive text input can make intelligent guesses based on which words you do use and how often you use them.

  8. A lot better than what I recall. by bagel2ooo · · Score: 1

    I'm hopeful that this will eventually make it into word processors like in the OpenOffice or Microsoft Office suites. Seems like the best standard faire we have is a little paperclip/dog/wizard/other nuisance asking how he can "help" make a cover letter.

    --
    ( o ) one could say I'm rather baked
    1. Re:A lot better than what I recall. by Tablizer · · Score: 1

      I'm hopeful that this will eventually make it into word processors like in the OpenOffice or Microsoft Office suites. Seems like the best standard faire we have is a little paperclip/dog/wizard/other nuisance asking how he can "help" make a cover letter.

      That is a horrible idea. Burn it! Hope it never happens. Sounds like something a college droppout geek who often gets pie'd in the face would do.

  9. Riiiiiiiiight by $RANDOMLUSER · · Score: 4, Funny
    "Twas brillig, and the slithy toves
    Did gyre and gimble in the wabe:
    All mimsy were the borogoves,
    And the mome raths outgrabe."

    It knew he was going to say that.

    More likely, it's going to predict that someone's going to say "Let's circle back and touch base tomorrow".

    --
    No folly is more costly than the folly of intolerant idealism. - Winston Churchill
    1. Re:Riiiiiiiiight by seanellis · · Score: 1

      Aagh. You beat me to it.

    2. Re:Riiiiiiiiight by blue+trane · · Score: 1

      It won't create brand new things, but if you train it on that text it might help you create things in the same genre...

  10. Why this isn't the same as T9. by pkhuong · · Score: 4, Informative

    Having used both dasher and T9, it seems to me that t9 only takes into account the keystrokes entered for each word. It then correlates them to a dictionary. Dasher, on the other hand, is based on markov chains (yes, like those word/text generators), and thus takes into account the last [n] characters. That makes it much more accurate, and, interestingly enough, should make it particularly well-suited to editing programs in most mainstream languages, since they have a lot of noise words and frequently used sequences.

    --
    Try Corewar @ www.koth.org - rec.games.corewar
  11. mis-read by Mirri · · Score: 1

    I read, "a novel application of statistical theory" as "a novel of statistical theory" - and I was still interested!

    1. Re:mis-read by Anonymous Coward · · Score: 0

      Yep, me too. Neal??

  12. Quick test by potifar · · Score: 4, Insightful
    MathML was never really intended for writing by hand, and even if Apropos makes it easier, I can't see myself switching from (La-)TeX anytime soon. I can enter extremely complex mathematical expressions at least 20-30 times faster by typing them in TeX than I ever could do clicking around an interface like Apropos.

    MathML is a good idea in theory, but until there are good tools for writing and editing MathML, there will be very few people using it (either for publishing or for archival purposes.)

    1. Re:Quick test by Anonymous Coward · · Score: 0
      Whatever new methods come for editing mathematical text, MathXML will be an excellent interchange or even storage format

      Hell, I can't make sense of 'Tex but I can have a good shot at interpreting MathXML, so even WP applications will have a chance.

      /JE

    2. Re:Quick test by potifar · · Score: 1, Insightful
      Do you really think
      <apply>
      <int/>
      <bvar>
      <ci> x </ci>
      </bvar>
      <interval>
      <ci> a </ci>
      <ci> b </ci>
      </interval>
      <apply>
      <cos/>
      <ci> x </ci>
      </apply>
      </apply>
      is easier to read than
      $\int_a^b \cos x dx$
      ?
    3. Re:Quick test by Anonymous Coward · · Score: 1, Funny
      MathML is a good idea in theory, but until there are good tools for writing and editing MathML, there will be very few people using it (either for publishing or for archival purposes.)


      Wow, this has got to be the first time in the history of the world that a math person has criticized something for being "good in theory" but not "in practice". It's math! There's nothing in it but theory!
    4. Re:Quick test by Morphine007 · · Score: 1

      try this out.

    5. Re:Quick test by Rotund+Prickpull · · Score: 1, Funny

      Don't know about easier, but it looks more like XML so it must be better.

    6. Re:Quick test by RWerp · · Score: 1

      Or try xemacs with auctex and reftex packages. I would never type math equation that fast in a "point and click" environment.

      --
      "Long run is a misleading guide to current affairs. In the long run we are all dead." (John Maynard Keynes)
    7. Re:Quick test by Dan+D. · · Score: 1

      Proof by contradiction, obviously! The premise that he is a "math person" must necessarily be wrong. Mathematicians are *never* inconsistent... just incomplete occasionally.

      --
      People who quote themselves bug the crap out of me -- Me.
    8. Re:Quick test by Anonymous Coward · · Score: 0

      There is no difference between theory and practice. That is just a myth that engineers propagate so that they don't have to justify their conclusions. Mathematicians propagate it so that they won't be expected to actually do anything useful.

  13. Failures of inattention by hussar · · Score: 4, Interesting

    As other posters have noted, this sounds a lot like T9, which is used in cell phones for predictive text entry. T9 is a great utility, but it has happened that what I am writing is less predictable or the there is a more often used combination of letters that results from the keys I have hit. If I don't pay attention, I get the wrong word.

    I can't help but think of someone entering a mathematical equation and concentrating more on his idea than what is being written to the screen. Due to this inattention, the equation doesn't work, he figures he's just wrong, and spends hours/days to find the point at which the computer put in its prediction and not what he thought he entered. Worst case, he could abandon what would have been a great idea.

    Or, imagine this applied to writing computer programs. Say for example, you are writing a program to calculate the correct distance the probe should hold above the atmosphere so it doesn't burn up. Your cube mate distracts you briefly, and...

    --

    Bureaucracy loves company.
  14. um... by nFriedly · · Score: 1

    orrect strings are more probable than incorrect ones

    apparently they havnt taken my writing into account

    1. Re:um... by nFriedly · · Score: 1

      for example, forgetting the /i closing tag and not previewing it

  15. Like auto-completion, only better? by objekt · · Score: 1

    I hope so, then I might actually use it! :D

    --
    -- Boycott Shell
  16. GIGO would be proud by 10am-bedtime · · Score: 2, Insightful

    data integrity starts w/ data entry. when data entry is reduced to "no" vs "yes-for-now-we-can-fix-it-later", the game is lost; GIGO prevails, then.

  17. Correctness, huh? by The_REAL_DZA · · Score: 2, Funny
    A big advantage of statistics-based interfaces is that they automatically enforce correctness, because correct strings are more probable than incorrect ones.


    They obviously didn't include many PHBs' writings in their calculations...
    I'm frequently amazed at some of the grammatical... umm... experimentations undertaken by the upper two or three levels of management in their memos -- and the speeling, good grief, the SPEELING!! Is [F7] the last great secret of our civilization?!?!
    --


    This space intentionally left (almost) blank.
    1. Re:Correctness, huh? by Hognoxious · · Score: 1
      They obviously didn't include many PHBs' writings in their calculations...
      ... or take a look at slashdot. It's inhabitants are some of the biggest loosers on teh intarweb when it comes to spelling and grammer.
      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  18. Never been.... by mr_z_beeblebrox · · Score: 1, Funny

    A big advantage of statistics-based interfaces is that they automatically enforce correctness, because correct strings are more probable than incorrect ones.

    Though probably college educated the writer of the above sentence has probably NOT BEEN a TA in an English class. Truly correct strings are a rare find :-)

  19. Dasher and stats rock by palad1 · · Score: 4, Funny

    I did a quick test run of Dasher instead of RTFA, and as far as I understand, it works by presenting the most statistically-probable letter in the middle of the input area.

    So, by dragging a perfectly horizontal line with my mouse cursor, I was able to create the most statistically-probable sentence.

    Here goes, for Science:

    Kennedy insider&xeathGhed a noviceable. Punt.uetGrance beganic or Central believe t, space ship,' Alice, it is deleasantB.Carzone.That's luJbi

    Conspiracy theorists, area51 nuts and cypherpunks are going to be thrilled!

    1. Re:Dasher and stats rock by Anonymous Coward · · Score: 0

      No... It doesn't work by creating a perfectly horizontal line... it increases the screen real-estate devoted to the most commonly used strings. Since the commonly used strings have larger blocks, the statistical chance of the cursor being on them is significantly greater, but not perfect.

  20. what a productivity increase by myukew · · Score: 1

    before long you'll have to write only half of your program. the other half is predicted by some neat tool.

    Or imagine the possibilities for bookwriters. You write half an the rest is predicted based on your previous works. Seems as if some authors already use such a technique ;)

    1. Re:what a productivity increase by magefile · · Score: 1

      Law and Order is written this way ... well, the dialogue is. The plot is like Clue: "pick one card from the 'first suspect' category, one from the 'evidential contamination category' ..."

    2. Re:what a productivity increase by heby · · Score: 1

      or, if you're John Grisham, you write the second half of the title (the first half invariably being "the") and the neat tool writes a book and a movie script...

  21. Ahem by kahei · · Score: 3, Insightful


    A big advantage of statistics-based interfaces is that they automatically enforce correctness, because correct strings are more probable than incorrect ones.

    In a rigorous, technical environment, being _usually_ correct is not enough and a statistics-based approach to ensuring correctness is not very useful.

    In an informal environment, correctness is not nearly as common as you might hope, so again a statistics-based approach may well not be as good as actually enforcing definite correctness.

    --
    Whence? Hence. Whither? Thither.
  22. Re:correctness? by kahei · · Score: 0, Redundant


    APOSTROPHE! APOSTROPHE!

    The Orthographic Commandos have been notified, and a kill squad is now on its way to your location. For your own comfort and convenience, please choose not to resist. Have a nice day!

    --
    Whence? Hence. Whither? Thither.
  23. Dasher for Zaurus? by egghat · · Score: 1

    Has anybody tried to compile (and succeeded) Dasher for my beloved Zaurus?

    Bye egghat.

    --
    -- "As a human being I claim the right to be widely inconsistent", John Peel
    1. Re:Dasher for Zaurus? by tornado2258 · · Score: 1
      I have just been playing eith it on my laptop and given the CPU usage here it looks like it would have to be configured for considerably less calculations to run on a Zaurus (don't know if this even possible) and this looks like it would make little more useful than the built in word completion.

      Note: I am only guessing for everyhting beyond the CPU usage on this machine, but then isn't that the way of /.

  24. Funny thing is... by hussar · · Score: 1

    ...we find the unpredictable more interesting.

    And, there are no predictable new ideas. Who could've guessed that Einstein would follow the equals sign with "mc^2".

    --

    Bureaucracy loves company.
  25. Why? by Quixote · · Score: 1, Insightful
    a properly trained one will make it harder to create pages with blinking text, loads of images etc.

    Why should it? What if I want to create such a page? Why should someone (or something) tell me what to say, or how to say it? And who will "train" such a thing? The Government??

    1. Re:Why? by r3m0t · · Score: 2, Informative

      'Why should it? What if I want to create such a page? Why should someone (or something) tell me what to say, or how to say it? And who will "train" such a thing? The Government??'

      To make the other (more likely) options more easily available, spend a lot of time poking around for tags with smaller targets *or* type it by hand *or* change the settings to lower the effect of prediction *or* replace the training files *or* just use the damn thing since it'lol learn, nobody's telling you to do anything, and The Government (as you call it) wouldn't bother.

      Happy now?

    2. Re:Why? by Anonymous Coward · · Score: 0

      Why should it? What if I want to create such a page?

      Harder than the default case does not mean impossible.

      Why should someone (or something) tell me what to say, or how to say it?

      Why should anywone ever promp you on what you might say? do you never chose from a menu? what are you, some kind of idiot?

      And who will "train" such a thing? The Government??

      Yet off your high horse already. Training on this context just means feeding in a representative sample of your output so far, to detect common paterns. Like customining speah recognition. In your case, that training would probably be enough to generate endless paranoid rants about the "freedom" to rant.

    3. Re:Why? by r3m0t · · Score: 1

      Fuck. I wrote my reply, and the grandparent still got another Insightful mod. You're all crazy! Crazy, I say!

    4. Re:Why? by stephanruby · · Score: 1
      Fuck. I wrote my reply, and the grandparent still got another Insightful mod.

      Obviously, you're a karma whore who's trying to work both sides of the issue.

    5. Re:Why? by r3m0t · · Score: 1

      What? Karma whoring, me? I thought the post you refer to (grandparent now, heh) wouldn't get modded at all. How am I trying to work both sides? Not like I made the first post in this thread.

    6. Re:Why? by Anonymous Coward · · Score: 0

      I was only kidding. Come on relax. I know what you meant.

  26. Re:correctness? by r_j_howell · · Score: 3, Interesting

    from the dasher site http://www.inference.phy.cam.ac.uk/djw30/dasher/ :
    With version 3, as with version 1.6, every language requires a text file full of natural writing (about 300K or more); a specification of the alphabet of the language is also required.
    It wouldn't be hard at all to make it work for English, as opposed to Americanese, all you have to do is train it on text written with your own preferred idiosyncrasies

  27. Screenwriters by AndyChrist · · Score: 1

    Have been using this approach for decades.

  28. Further dumbing of humanity by G4from128k · · Score: 1

    Overuse of this technology will result in repetitive and boring prose. Yes, well-written prose does have some redundancy/predictability -- it helps the reader stay on track, reinforces key points, reminds the reader, etc. This technology will help some writers create more consistent text. Yet I fear that too many will rely too much on this crutch.

    The problem is that the best prose contains unexpected novelty such as a plot twists, new facets of a character, joke punch lines, etc. In a true "page-turner" the reader can't predict what will happen next. This novelty (appropriate for a good "novel") is the opposite of what this technology offers.

    --
    Two wrongs don't make a right, but three lefts do.
    1. Re:Further dumbing of humanity by r3m0t · · Score: 1

      It only works on the word and sentence level. It will never be used to write a novel.

    2. Re:Further dumbing of humanity by myukew · · Score: 1

      they save this feature for the next version ;)

    3. Re:Further dumbing of humanity by blue+trane · · Score: 1

      So put in a randomizer, used at random points...

  29. This approach favors bloated, redundant encodings by alispguru · · Score: 3, Interesting

    The reason predictive interfaces work is that most encodings have some degree of redundancy in them. English text is about 50% redundant information, in an information-theoretic sense, and anything based on XML is going to be more so.

    To see this for yourself, pick a nice big hunk of English text and gzip it. You'll get about 50-60% compression. Now, pick a similar-sized hunk of XML and gzip it - you'll probably get 75% compression or more.

    Tools like this make using bloated, redundant encodings more tolerable by automating some of the redundancy away. It's not clear to me that this is a good thing.

    --

    To a Lisp hacker, XML is S-expressions in drag.
  30. Why not TeX? by Da+Penguin · · Score: 1
    Sure it looks interesting, but I really do not see the point of swiching from LaTeX: the de facto standard for any math write-up. MathML: written for computers. TeX: written for humans to write.

    That said, I have been feeling that TeX is a bit outdated as a system, but then I discovered TeXmacs. This is a fully wysiwyg editor for TeX, where you type in TeX code and see the formatting instead of the code. I have switched to using it, and would definitely recommend it to others

  31. Training by Dracolytch · · Score: 1

    Looks like you can train this thing by giving it large amounts of text in the language of your choice.

    I'm going to pop over to OpenOffice.org, and use their source to create a training document.

    Stay tuned for details.

    ~D

    --
    This sig has been enciphered with a one-time pad. It could say almost anything.
    1. Re:Training by Dracolytch · · Score: 2, Informative

      Ok, OpenOffice.org proved to be too large for me to really use, so I hopped over to the GIMP instead. I grabbed a copy of their source, and created a text file that appended all of the c files I could find in one directory... About 750k.

      I took the "English with lots of punctuation", and copied the .xml file. It turns out that using their little interface for creating a language is a PITA, and just copying an existing file works pretty well. I tweaked it to change the name of the language, and point to the right training document.

      It needs a little work, because there's no way to tell the difference between a space and an underscore, but for the most part it works pretty well. As a fairly quick test, I'd call it a great success.

      I also did the same thing using PHP. Similar results. I got a chuckle when I was able to visually see the probability of me typing _POST or _REQUEST after any $.

      Pretty neat. Slower than typing, but it has some interesting possibilities.

      ~D

      --
      This sig has been enciphered with a one-time pad. It could say almost anything.
    2. Re:Training by Anonymous Coward · · Score: 0

      You can tell the difference between space and underscore in two ways, in DASHER:

      (1) space and newline are the only two characters that appear inside WHITE boxes in the default colour scheme of Dasher.

      (2) you are free to modify the rendering of "space" to any unicode character. For example, you might like to try (that's what I use) - which is a black hollow rectangle.

  32. exactly by jbellis · · Score: 1

    "correct strings are more probable than incorrect" doesn't "enforce" correctness at all.

  33. Yeah! by Zebra_X · · Score: 1

    What's the probability that all of the texts written this way will be similar?

  34. Dasher vs T9, and NLP=SNLP by j.leidner · · Score: 1
    MacKay's Dasher is very useful since it's a simple tactile input device. Unlike T9, which speeds up entry using a conservative keypad, text entry with Dasher is based on up/down movements, which some handicapped people are capable of that could not operate an ordinary keypad.

    The statistical properties of languages are utilized in most (successful) approaches for natural language processing, from part-of-speech tagging, information extraction, syntactic parsing, machine translation to question answering; you could almost say that NLP=S(tatistical)NLP nowadays.

    --
    Try Nuggets , our mobile search engine. We answer your questions via SMS, across the UK.

  35. Really! by Ziviyr · · Score: 1

    This is a test of dasher.
    I find it a bitch to get proper punctuation, nevermind capitalization, and the routine stuttery freezes are amazingly annoying. I suppose if I were incapacitated to the point that I could only type by looking around I would appreciate it alot more though.
    So I'll just call it a really cool toy that is in fact worth trying out and hope some games incorporate some of this technology at some point in the future.

    --

    Someone set us up the bomb, so shine we are!
    1. Re:Really! by Ziviyr · · Score: 1

      This is a wicked cool feature. It is a reminder of how cool Linux can be. I can "type directly into the browser window using dasher!

      --

      Someone set us up the bomb, so shine we are!
    2. Re:Really! by r3m0t · · Score: 1

      Try it for ten minutes, properly. I don't get the freezes, but I have a fast computer. "Never mind" is two words and would have been easier to type. "alot" similarly. "Dasher" with a capital D is easier, too.

      What are you talking about with the capitalisation? After a full stop (question mark, etc) and a space, the yellow (capitals) box is massive.

      Why games?

    3. Re:Really! by Ziviyr · · Score: 1

      Define improperly for me.

      I've been pointing at letters for well over ten minutes now. I've figured out the capitals box now, nearly got the punctuation sorted out.

      Why games? Because I find the lack of straightforwardness and it's adapting to be the kind of feature I'd like to see in a game.

      My box is a 1GHz Athlon, which I never figured as really slow. I'd be noticing those stuttery freezes even if they were three times shorter though, easily. Perhaps my Gentoo compile is to blame.

      Regarding my spelling, while in dasher, it took a bit of focus to get out what I did, and backing up for corrections has generally been a surefire way to get the stutters.

      --

      Someone set us up the bomb, so shine we are!
    4. Re:Really! by Anonymous Coward · · Score: 0

      There's a bug under unix that makes the redraw loop have a lower priority than the recalculate loop. This only matters if it takes more than 50ms to recalculate. OOPS! Anyways, dasher is learning and adaptive so give it time and it'll learn what you use. Punctuation is hard because while you've already memorized the order of the alphabet, you've not yet memorized the order dasher presents punctuation.

      I have submitted a patch:

      --- dasher-3.2.0/Src/Gtk2/dasher.cc 2003-11-17 08:52:16.000000000 -0600
      +++ dasher-3.2.0-mine/Src/Gtk2/dasher.cc 2004-10-22 19:39:42.000000000 -0
      500
      @@ -1416,7 +1416,7 @@
      dasher_start();
      dasher_redraw();

      - g_timeout_add(50, timer_callback, NULL );
      + g_timeout_add_full(G_PRIORITY_DEFAULT_IDLE,50, timer_callback, NULL,NULL);

      // I have no idea why we need to do this when Glade has theoretically done
      // so already, but...

    5. Re:Really! by r3m0t · · Score: 1

      As in, not seriously or without reading the beginner's thing.

      Are you turning the speed slider up? I'm near maximum... 7 I think.

      Oh yes, I've tried it on a .5GHz P3 Gentoo and this 2.5GHz AMD64 Windows (shared, dammit, can't switch) and a Tablet PC (you hover your pen above the screen... very cool). The featureset is about the same (well, you can write into other windows, etc) but I can't really definately say that the Linux code is slower than the Windows one, since the Gentoo was so low-spec. (All others ran it smoothly, but on Gentoo dasher froze for about 15 seconds at a time).

      Do you mean a sort of typing practice game, with Dasher? I would imagine that as a background thing (i.e. network games) keyboard would be better since you can type without looking at it.

    6. Re:Really! by r3m0t · · Score: 1

      Sorry, but that feature is available on Windows too.

    7. Re:Really! by Ziviyr · · Score: 1

      I've got my hands full at speed 2. I was seriously trying, of course reading manuals and stuff might be a culprit (but not over freezing I hope).

      Game wise I mean as somehow worked into the game mechanics. Just tacking it on to some existing game as is for what it does would be kinda silly.

      --

      Someone set us up the bomb, so shine we are!
  36. Re:This approach favors bloated, redundant encodin by Zarf · · Score: 1

    Using this technology on source code (for instance) would be an extremely bad thing since it would encourage cut-and-paste or copy-and-mutate approaches to coding. The result would be highly regular and poorly factored source. But, I don't think anyone was actually suggesting this for program code... just a thought.

    --
    [signature]
  37. Application to program code will be "interesting" by mwood · · Score: 1

    Aim such a product at programmers, and you'll learn a few things about programmers.

    Correct spelling is no longer more probable than incorrect spelling. :-) Programmers as a class are notoriously poor spellers.

    Some misspellings are intentional. I knew a guy who frequently wanted to use MODE as a variable name in his COBOL programs. But MODE is a COBOL keyword and the compiler would hiss at him. So he now always spells it MOAD.

    Likewise some misspellings are due to local culture. Paw through some DEC code and you'll find that "controller" is always spelled "kontroller". It's not an error; it's probably to do with the more intelligent bits of DEC gear being given K-series board/unit designations.

  38. Not surprising... by RWerp · · Score: 1

    It is definitely clunky when compared to Dasher, but better than MS Equation Editor etc.

    I will be first to cheer anybody who invents a worse way of typing math than MS Equation Editor. Being better than that is not an achievement at all. Can't they simply learn TeX for their math?

    --
    "Long run is a misleading guide to current affairs. In the long run we are all dead." (John Maynard Keynes)
  39. about as novel as my foot by geg81 · · Score: 1

    Statistics has been used for decades in handwriting input, OCR, speech recognition, systems like T9, and other input modalities. Dasher seems pretty cumbersome in comparison to most of those.

    And the fact that it only generates "correct" input can be a real problem: names, foreign words, etc. just don't come out right.

    1. Re:about as novel as my foot by r3m0t · · Score: 1

      No, no, no! It does not only "generate" words! It simply enlarges the areas of where you're likely to go! You can zoom around and input anything you want, really. Then it'll add what you wrote to the training text so that next time that name is easier to input.

      Much better than T9 in that respect. Hendwriting recognition isn't so much better there either (with symbols and numbers, too).

    2. Re:about as novel as my foot by Anonymous Coward · · Score: 0

      No, no, no! It does not only "generate" words!

      "It" was referring to statistical methods in general.

      You can zoom around and input anything you want, really.

      You can with those other systems as well: you just have to write the unusual letters very, very clearly (just like Dasher).

      Then it'll add what you wrote to the training text so that next time that name is easier to input.

      So do other methods.

      Hendwriting recognition isn't so much better there either (with symbols and numbers, too).

      The point remains: it's novel in its details, but these kinds of statistically-based input methods in general are old, including on-line updates to the statistics.

  40. straitjacket? by Doc+Ruby · · Score: 1

    "3GL", third generation programming languages, were supposed to do for programming what these stat predictors do for data entry. They were menu interfaces, using syntax and grammar to offer only the valid options for the "next word" in a program. Usually with dropdown/popup menus for mousing in windows, the new computing paradigm back in the 1980s. But human expression turned out to be much less modal, and the UI always got in the way. Wake me when these interfaces have been playtested, and survive the arena.

    --

    --
    make install -not war

  41. Makes Writing Erotica A Snap! by Anonymous Coward · · Score: 0

    1) Submit "Penthouse Letters" for statistical analysis.
    2) ?????
    3) Profit!

  42. TeX math notation anyone? by khrtt · · Score: 1

    For math notation, no matter how good this might be, TeX is better. First, it goes through e-mail, and it's easy to read unrendered. I.e., people I send TeX notation to are guaranteed to be able to read it, without having to install software that doesn't necessarily exist for their favorite OS. Unless they are too lazy to learn the two pages of TeX documentation that list the math notation:-). Secondly, it's fast to type, and you don't need to take your hands off the keyboard. I doubt that there is any input system for math notation that's better than TeX.

    1. Re:TeX math notation anyone? by Anonymous Coward · · Score: 0

      There is a system that is a lot better than TeX: a pencil. I'm only waiting for a hand-writing recognition system that works on math notation. Although to be realistic, it's unlikely that the market is large enough to support the development of a sufficiently intelligent recognizer.

    2. Re:TeX math notation anyone? by khrtt · · Score: 1

      Hmm. Pencil is more expressive, sure. But how are you going to e-mail your notes to people? I mean, you could e-mail graphics, but nothing beats plain text. And how are you going to intersperse your math with text? You don't propose actually writing text with the pencil, do you?

  43. Random Words by Mr_Silver · · Score: 1
    A big advantage of statistics-based interfaces is that they automatically enforce correctness, because correct strings are more probable than incorrect ones.

    Feed the entire contents of /usr/dict/words into a markov generator and you get pretty much the same thing. Random words which, whilst not having any meaning, are reasonably syntactically correct.

    http://www.fourteenminutes.com/fun/words/.

    --
    Avantslash - View Slashdot cleanly on your mobile phone.
  44. Re:This approach favors bloated, redundant encodin by shic · · Score: 1

    I was about to point out in response to "A big advantage of statistics-based interfaces is that they automatically enforce correctness..." that rather than enforce correctness they will more likely introduce common errors.

    When designing a language - be that a simple one which can be encapsulated in an XML schema for example, or even a complex natural language there is a trade off between being efficiently terse and introducing sufficient redundancy as to allow communicants to differentiate signal from noise. If you enter data in a format too terse then you are more likely to make errors which can't easily be detected - if you enter in a language too redundant you will find it tedious but errors are more likely detectable at a syntactic level.

    For this reason I'd like to see exactly the opposite approach... I'd like to see long-hand ways of entering data where my errors are detected and flagged - which are then parsed and stored or transmitted in a more efficient format.

  45. Apples and Oranges by ttfkam · · Score: 1

    Dasher indeed looks interesting. The heuristics remind me of the input methods for Japanese keyboards where hiragana or katakana are entered, and depending upon the context, a short list of matching kanji is presented to choose from. Elegant solutions to a complex problem.

    However, while Dasher can be compared to the JavaScript application that works with MathML, Dasher and MathML cannot be directly compared. Determining correctness would be from a program reading the DTD or schema of MathML. MathML would just be the serialized form (the data format).

    (Not that I'm suggesting it be done but...) It's like saying that a C program would be written with prediction on raw parentheses and curly braces in the C source file. If anything, the predictive algorithm would be supplied with BNF notation. The C code would just be the output format.

    I don't see why the technology associated with Dasher could not be applied to parsing DTD or schema files for output to XML syntaxes like MathML.

    --

    - I don't need to go outside, my CRT tan'll do me just fine.
  46. Oh, I don't know how good this is. by Anonymous Coward · · Score: 0

    One of the things that sucks about MS Office is AutoCorrect. Granted, it helps fix a few typos, especially of the "teh" type. What really is annoying is not turning off how it converts URLs, UNC paths, etc., into "hyperlinks".

    T9 input, imho, kind of sucks also, unless you're IM'ing and just doing a lot of simple "dood, where U B", adding new entries into the phone book on the phone, etc.

    If the lexical hierarchy has too many words that have very similar SOUNDEX values, same set of initial characters, etc., it's not going to save much time or effort. It takes much longer to pick a word out of a list.

    My bias: I'm a touch-typist, so for me it's usually just easier to keep plowing through the typing, rather than stopping to select the "right" choice presented to me. Nor do I IM. It's hard to IM when you're driving.

    Plus, again, it all comes down to the quality of the statistical set (or dictionary). Is your writing target for writing "business speak"? Is it for writing medical or legal docs? Or whatever. A set for "general" writing will just be about as bad as Word's grammar checker or spelling dictionary...

  47. this idea has been around for a long, long time by TheDemotic · · Score: 2, Interesting

    Claude Shannon, the father of information theory, used the idea referenced here in his famous 1950 experiment to calculate the entropy of the English language. See "Shannon Game" at, for example, http://www.math.ucsd.edu/~crypto/java/ENTROPY/ There's also an entire field, often referred to as "Natural Language Processing," which uses empirical observations of large amounts of language data (text or speech) to construct statistical models which do speech recognition, language translation, text summarization, spelling correction (and, yes, people at Microsoft Research have worked on this), etc. Finally, Hemos writes "Stochastic modeling can also be used as a basis for speech recognition, with the recognizer using the model to choose a continuation when the speech signal is ambiguous or indistinct." FYI, speech signal is _always_ ambiguous, from the perspective of a machine trying to transcribe it to text. I very much doubt there's been any successful speech recognition work in the last 15 years on a non-statistical system.

  48. OT: Your sig by ttfkam · · Score: 1
    To a Lisp hacker, XML is S-expressions in drag.
    And that Lisp hacker would be wrong.

    The linked article neglects to mention Unicode compatibility in its list, but a good read nonetheless.
    --

    - I don't need to go outside, my CRT tan'll do me just fine.
    1. Re:OT: Your sig by alispguru · · Score: 1
      Sorry, Slashdot signatures are limited to 120 characters, and are meant to be short and provocative.

      I've seen that article before. It does a fairly good job of missing the point, or seeing the point and getting it backwards. XML does get one thing right - the idea that chunks of information ought to be self-describing, down to the character set level. Even Common Lisp punts on that one - the spec basically says "we require this subset of ASCII, and here's an API to manipulate whatever your implementation supports."

      However, when I say "XML is S-expressions in drag", I mean:

      XML and S-expressions are roughly equivalent in representational power - they can both do labeled trees(*).

      XML has unneeded complexity that does not give it more representational power - consider the brain-damaged distinction between attributes and sub-elements, or the way namespaces and DTD's sort-of kind-of interoperate.

      XML is missing important stuff, and grafting that stuff in afterwards is painful - S-expressions at least have the idea of a number as a leaf element; you need XML Schemas to do that.

      XML is promoted as the data format that is going to solve all interoperability problems - witness the current Semantic Web hype. I maintain that there has been no significant progress in ontology wrangling since the mid 1980's, when several depressing results were published - the most depressing one was that certain basic operations on ontologies are NP-complete. This was done before the Web took off, so modern "researchers" can't find it (if Google doesn't see it, it doesn't exist), but nevertheless it's still true, and it's still waiting to clobber Semantic Web efforts.

      I will consider changing my .sig when I hear something about the Semantic Web that's more than hype, or I hear about a programming language that bottoms out in XML that anyone actually uses.

      (*) Yes, I know they can actually do labeled graphs. S-expressions in Common Lisp can do arbitrary graphs within one "document", and this has worked since 1984. XML tries to do this with magic attributes (ID and IDREF, I think), which requires the application to recognize links in the DOM - yecch. XML also has cross-document links with stuff like XPath - this is a big problem, and XPath is a big, hairy solution.

      --

      To a Lisp hacker, XML is S-expressions in drag.
    2. Re:OT: Your sig by ttfkam · · Score: 1

      XML has unneeded complexity that does not give it more representational power - consider the brain-damaged distinction between attributes and sub-elements, or the way namespaces and DTD's sort-of kind-of interoperate.

      I usually treat things as:
      metadata = attribute
      related content = sub element

      You are right in that there are no hard and fast rules for what should be an attribute and what should be an element, but then I really haven't found it to be a real problem once I adopted the above.

      DTDs do suck. I never plan on writing one again. XML Schema is too complicated for most applications. My personal favorite is RelaxNG which most popular parsers support now. Simple, easy to understand, and can also handle validation of data types.

      XML is missing important stuff, and grafting that stuff in afterwards is painful - S-expressions at least have the idea of a number as a leaf element; you need XML Schemas to do that.

      No, it follows the Principle of Least Power. XML by itself was never intended to have all of these other technologies built into the syntax by design. DTD vs. XML Schema vs. RelaxNG is a case in point. DTDs, a holdover from SGML, was never looked on fondly but it aided adoption by the SGML crowd. XML Schema was a standard by quarelling committee, and it shows. Folks wanted something as powerful but not as complex. Hence RelaxNG and its forebears were born. Note how this does not affect how XML documents look. An XML document I wrote back in 1999 can still be validated against a RelaxNG schema today.

      And while you might think of this situation as fragmenting, please remember that use of DTDs does not disallow use of XML Schema or RelaxNG somewhere else in the chain. It allows for competition and best of breed to emerge. On the other hand, S-Expressions doesn't even have a bad standard schema language. It instead has hundreds. One schema language for almost each S-Expression schema. This is an improvement?

      And while people may give namespaces a bad name, I happen to love them. Mixing document types without losing the ability to separate them again. Extensible validation. Is it perfect? Absolutely not. But what does S-Expressions have that's better or do you simply combine two documents and hope that you have no collisions?

      XML is promoted as the data format that is going to solve all interoperability problems - witness the current Semantic Web hype.

      Show me someone credible that claimed that. I don't ever recall Tim Berners-Lee ever saying that. I don't recall Jim Clark, Tim Bray or Mike Kay ever saying that. I don't recall any project managers from IBM, Microsoft or Apache ever saying that. I think you are characterizing the argument through the comments of wingnuts. That equals strawman.

      XML doesn't solve all interoperability problems, however it helps with many interoperability problems. There is a major difference there, but because it cannot solve everything, some XML detractors don't want to use it for anything.

      I will consider changing my .sig when I hear something about the Semantic Web that's more than hype, or I hear about a programming language that bottoms out in XML that anyone actually uses.

      What do you mean by this? Bottom out as in is described in XML or bottoms out as in is serializable to XML or...?

      S-expressions in Common Lisp can do arbitrary graphs within one "document", and this has worked since 1984. XML tries to do this with magic attributes (ID and IDREF, I think), which requires the application to recognize links in the DOM

      Well of course. S-Expressions are serialized data structures. In the serialized form, S-Expressions do a similar thing. This is not a weakness of XML. (And for the record, ID and IDREF are types, not hardcoded attributes names.) If an

      --

      - I don't need to go outside, my CRT tan'll do me just fine.
    3. Re:OT: Your sig by alispguru · · Score: 1
      Wow, my first reply that's longer than the standard Slashdot limit. I'm honored ;-).

      You are right in that there are no hard and fast rules for what should be an attribute and what should be an element, but then I really haven't found it to be a real problem once I adopted the above.

      My heuristic for that is attributes are for metadata that has little or no structure, and is very unlikely to change. In practice, this reduces to "never use attributes" for me.

      My personal favorite is RelaxNG which most popular parsers support now.

      Mine too. Compact, intelligible syntax, and a decent automata-based grounding - what's not to like, other than the W3C's not-invented-here attitude.

      On the other hand, S-Expressions doesn't even have a bad standard schema language. It instead has hundreds. One schema language for almost each S-Expression schema. This is an improvement?

      No - more of an acknowledgement that the real test of document validity is always processing it. Validation against schemas of any sort can't really check everything, unless you have a schema language that's as powerful as your application programming language. Schemas are good as a documentation formalism, but I suspect the attitude of most Lisp hackers to mechanical pre-validation would be "why bother?" Hey, what do you expect from a bunch of slobs who prefer untyped variables (but strongly typed data)?

      Show me someone credible that claimed that. I don't ever recall Tim Berners-Lee ever saying that.

      OK. You asked for it. You have to pay for the full article, but the excerpt at the link gives you the hype flavor. The most depressing thing about that article is that one of its co-authors is Jim Hendler. I've worked with Jim - he and I are both veterans of the AI bubble and the AI Winter that followed. He knows that most of the Semantic Web hype requires solutions to problems where no significant progress has been made since 1980.

      I could reply in more detail on character sets and structure encodings, but I'm willing to agree that Lisp has been pretty stagnant since the ANSI Common Lisp spec was finalized in 1990. Hell, CL has a lot of stuff missing from it - no standard for sockets, threads, Unicode... Part of the problem is that Lisp people tend toward the MIT side of the worse-is-better spectrum. CL has no standard threads because they would require a decent solution to multi-processing GC, which still doesn't exist (Java implementations to the contrary). There is hope that things are going to start moving again in the near future.

      When I see your sig, I don't have a sudden desire to defend XML from the big, bad Lispers. I simply marvel at the hubris.

      I have a similar reaction when I see people saying that putting something into XML "solves the interoperability problem". Please note, I'm not saying you are one of those. Anyone who likes RelaxNG is clearly not one of the XML sheep.

      XML is an inelegant solution to the problem of serializing nested property lists. Its SGML roots are a bigger boat-anchor than Lisp's historical weirdnesses - most people like it because it looks like something simple they're familiar with (HTML), and it's the first time they've encountered any notation for nested property lists.

      Unfortunately, XML is so entrenched that I see no possibility of "XML Winter" - XML getting the blame for the upcoming failure of the Semantic Web to live up to its hype, the way Lisp got the blame for AI's failure in the 1980's.
      --

      To a Lisp hacker, XML is S-expressions in drag.
  49. Re:Application to program code will be "interestin by r3m0t · · Score: 1

    "I knew a guy who frequently wanted to use MODE as a variable name in his COBOL programs. But MODE is a COBOL keyword and the compiler would hiss at him. So he now always spells it MOAD."

    Similarly for "list" in Lisp or Scheme. I use "lyst" since I learnt the basics off Douglas Hofstadter from "Metamagical Themas".

  50. Well... by SuperKendall · · Score: 1

    I did after the first two words, perhaps it's just well-read.

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  51. Let me explain... by ohell · · Score: 1
    • Dasher is very different from T9. T9 is basically a lexicon lookup system, and has to be abandoned for words not in the dictionary. Dasher lets the user write any string in the language, though some are considered more likely than others. There is a good analysis of T9 here.
    • Dasher does not constrain the writer.Dasher calculates the likelihood of symbols in the string based on the patterns it sees in the symbols of a training corpus. This allows the program to be tailored for various applications. (e.g. train it on a corpus of 10000 text messages, & itll strt 2 ryt lyk dis tho ppl cud stl ryt prpr nglsh. gr8.).
    • Statistical modelling does not enforce anything. I suspect that the intent behind the statement about 'correctness' is that spellings present in the training corpus will be preferred. For example, a dasher user is unlikekly to misspell 'receipt' (assuming that the word appears in the training text) because once you 'rec' has been written, 'ei...' will be shown with a higher probability than 'ie'.
    • Apropos does not aim to replace TeX. The intent of the two programs is different: Tex is a typesetting program, geared towards representing the notation of mathematics, while Apropos uses MathML in order to capture the meaning of mathematical expressions. Apropos does display the notation to the user, but only because people use notation to understand mathematics. Meaning of a mathematical expression is independent of the notation (e.g. f'(x) vs. df(x)/dx), and humans use a lot of context to parse notation. This is why conversion from Content MathML to TeX (and other display formats) is straightforward, but TeX to Content MathML is ambiguous.
    • Apropos' model operates at a higher level thancharacters or words. Apropos predicts based on the grammar of Content MathML. It does not 'exploit' the redundancy in the syntax of XML - that should be obvious by trying it out once. It does not ask the user to create symbols of XML, rather it aids in constructing the mathematical expression by representing it as a (prefix) tree. So there is no question of Apropos predictions being based on inefficiencies in XML.
    • Finally, both Dasher & Apropos do not try to stifle creativity, free speech, geekiness or bad spelling. (well, maybe bad spelling). They are only designed to seek out redundancies in languages and exploit them to make repetitive tasks a little easier.

    Please read my report for a detailed description of how Apropos works. I have contrasted the system with both T9 & TeX.

    --
    Three o'clock is always too late or too early for anything you want to do. - Jean-Paul Sartre
    1. Re:Let me explain... by Anonymous Coward · · Score: 0

      I'd like an explanation of why the acknowledgement is still not updated.