Inkwell No Longer From the Newton?
CrezzyMan writes "From this post on the Newtontalk.net mailing list:
Some of you may be interested to know that in the Inkwell section on Apple's website the following original text (straight after the keynote):
'Based on the Newton's 'Print Recognizer'-widely considered to be the
world's first genuinely usable handwriting recognition solution-Inkwell's
handwriting recognition is highly accurate and extensively tested'
has been changed to:
'Built on Apple's Recognition Engine - Inkwell's handwriting recognition is
the best in the industry.'
Steve must really hate the Newton..."
I'd be more likely to consider Inkwell a good technology if I knew it was from the Newton, but I was an actual Newton user. Most people erroneously think the HWR in Newton OS was bad (thanks to The Simpsons!).
Hey, Dolf. Take a memo on your Newton. "Beat up Martin."
*writes memo*
*Newton translates to: "Eat up Martha."*
Bah.
*Throws Newton at Martin*
Wasn't it Doonsebury which effectively killed off any hope for the newton>
None are more hopelessly enslaved than those who falsely believe they are free. Johann Wolfgang von Goethe.
Neural networks provide robust character recognition for Newton PDAs
Larry Yaeger, Apple Computer
While on-line handwriting recognition is an area of long-standing and ongoing research, the recent emergence of portable, pen-based computers (personal digital assistants, or PDAs) has focused urgent attention on usable, practical solutions.
Pen-based PDAs depend wholly on fast and accurate handwriting recognition, because the pen serves as the primary means for inputting data to the devices. To meet this need, we have combined an artificial neural network (ANN) character classifier with context-driven search-over character segmentation, word segmentation, and word recognition hypotheses to provide robust recognition of hand-printed English text in new models of Apple Computer's Newton MessagePad.
Earlier attempts at handwriting recognition used strong, limited language models to maximize accuracy. However, this approach failed in real-world applications, generating disturbing and seemingly random word substitutions known colloquially within Apple as "The Doonesbury Effect" (due to Gary Trudeau's biting satire based on first-generation Newton recognition performance). We have taken an alternative approach, using bottom-up classification techniques based on trainable ANNs, in combination with comprehensive but weakly applied language models. By simultaneously providing accurate character-level recognition, via the ANN, with dictionaries exhibiting very wide coverage of the language (as well as special constructs such as date, time, and phone numbers), plus the ability to write entirely outside those dictionaries (at a low probability), we have produced a hand-print recognizer that some have called the first usable handwriting recognition system.
The core of Apple's print recognizer is the ANN character classifier. We chose ANN technology at the outset for a number of key attributes. First, it is inherently data-driven-it learns directly from examples of the kind of data it must ultimately classify. Second, ANNs can carve up the sample space effectively, with nonlinear decision boundaries that yield excellent generalization, given sufficient training data. This results in an ability to accurately classify similar but novel patterns, and avoids certain classic, subtle data dependencies exhibited by hidden Markov models (HMMs), template matching, and other schemes, such as over-sensitivity to hooks on tails, pen skips, and the like. In addition, there is a rich literature demonstrating the applicability of ANNs to producing accurate estimates of a posteriori probabilities for each class, given the inputs.
In some respects, our ANN classifier is quite generic, being trained with standard error backpropagation (BP). Our network's architecture takes advantage of previous work, indicating that combined, multiple recognizers can be much more accurate than any single classifier. However, we combine those parallel classifiers in a unique fashion, tying them together into a single, integrated multiple-representations architecture, with the last hidden layer for each, otherwise independent, classifier connected to a final, shared output layer. We take one classifier that sees primarily stroke features (tangent slope resampled to a fixed number of points), and another classifier that sees primarily an anti-aliased image, and combine them only at the final output layer. This architecture allows standard BP to learn the best way to combine the multiple classifiers, which is both powerful and convenient.
Training an ANN character classifier for use in a maximum-likelihood word recognition system has different constraints than would training such a network for stand-alone character recognition. In particular, we have devised several innovative network training techniques, all of which modestly degrade the accuracy of the network as a pure character classifier, yet dramatically improve the accuracy of the word recognition system as a whole.
The first of these techniques we refer to as NormOutErr, short for "normalized output error." Training an ANN to classify 1-of-N targets with standard BP produces a classifier that does a fine job of estimating p(class|input) for the top-choice class. However, BP's least mean-squared error solution, together with typical classification vectors-that consist of all 0s except for a single 1 corresponding to the target class-results in a classifier that does not estimate second- and third-choice probabilities well. Rather, such classifiers tend to make unambiguous single-choice classifications of patterns that are, in fact, inherently ambiguous. The result is a class of recognition errors involving a single misclassified letter (where the correct interpretation is assigned a zero or near-zero probability) that causes the search to reject the entire, correct word.
We speculated that this effect might be due to the preponderance of 0s relative to 1s in the target vectors, as seen at any given output unit. Lacking any method for accurately reflecting target ambiguity in the training vectors, we tried partially normalizing this "pressure toward 0" relative to the "pressure toward 1." We did this by modulating the error seen at nontarget output units by a scale factor, while leaving the error at the target output unit unmodified. This generally increased the activation levels of the output units, and forced the network to allocate more of its resources to the modeling of low probability samples and classes. Most significantly, it allowed the network to model second- and third-choice probabilities, thus making the ANN classifier a better citizen in the larger recognition system. While this technique reduced top-choice character accuracy on the order of a percent, it dramatically increased word-level accuracy, resulting in approximately a 30% reduction in word-level error rate.
Another of the techniques we apply routinely in our ANN training is what we call frequency balancing. Training data from natural English words and phrases exhibit very nonuniform priors for the various character classes, and ANNs readily model these priors. However, as with NormOutErr, we find that reducing the effect of these priors on the net, in a controlled way, and thus forcing the net to allocate more of its resources to low-frequency, low-probability classes, significantly benefits the overall word recognition process. To this end, we explicitly (partially) balance the frequencies of the classes during training. We do this by probabilistically skipping and repeating patterns, based on a precomputed repetition factor. (Each presentation of a repeated pattern is "warped" uniquely, as discussed later.) This balancing of class frequencies is conceptually related to a common method for converting from ANN estimates of posterior probability p(class|input), to the value needed in an HMM or Viterbi search p(input|class), which is to divide by p(class) priors. However, our approach avoids potentially noisy estimates of low-probability classes resulting from division by small numbers, and eliminates the need for subsequent renormalization. Again, character-level accuracy suffers slightly by the application of this technique, but word-level accuracy improves significantly.
While frequency balancing corrects for under-represented classes, it cannot account for under-represented writing styles. We use a probabilistic skipping of patterns to address this problem as well, but this time for just those patterns that the net correctly classifies in its forward/recognition pass, which results in a form of error emphasis. We define a correct-train probability for use as a biased coin to determine whether a particular pattern, having been correctly classified, will also be used for the backward/training pass. This only applies to correctly segmented, or positive patterns, and misclassified patterns are never skipped. Especially during early stages of training, we set this parameter fairly low, thus concentrating most of the training time and the net's learning capability on patterns that are more difficult to correctly classify. This is the only way we were able to get the net to learn to correctly classify unusual character variants, such as a three-stroke "5" as written by only one training writer.
Other special training techniques include negative training--presenting missegmented collections of strokes as training patterns, along with all-zero target vectors--and stroke warping--deliberate random variations in stroke data, consisting of small changes in skew, rotation, and x and y linear and quadratic scalings. During recognition, the ANN classifier will necessarily encounter both valid and invalid combinations of strokes, and must classify them as characters. Negative training helps by tuning the net to suppress its output activations for invalid combinations, thus reducing the likelihood that those missegmentations will find a place in the optimum search path. Stroke warping effectively extends the data set to similar, but subtly different writing styles, and enforces certain useful invariances.
Two practical considerations in building an ANN-based system for a hand-held device are speed and memory limitations. Especially for the ARM 610 chip that drives the Newton MessagePad 120 and 130 units, 8-bit integer operations are much faster than either longer-integer or floating-point operations, and cache coherency benefits from reduced data sizes. In addition, memory is at a premium in these devices. So, despite previous work that suggests ANN training requires roughly 16-bit weights, we were highly motivated to make 8-bit weights work. We took advantage of the fact that the ANN's forward/recognition pass is significantly less demanding, in terms of precision, than is the backward/learning pass. It turns out that 1-byte (8-bit) weights are sufficient if the weights are properly trained. We limit the dynamic range of floating-point weights during training, and then round to the desired precision after convergence. If the weight limit is enforced during high-precision training, the net's resources will adapt to compensate for the limit. Because bias weights are few in number, however, and very important, we let them use 2 bytes with essentially unlimited range. Performing our forward/recognition pass with low-precision, 1-byte weights (a 3.4 fixed-point representation, ranging from almost -8 to +8 in 1/16 increments), we find no noticeable degradation relative to floating-point, 4- or 2-byte weights using this scheme. We have also developed a net training algorithm based on 8-bit weights, by appending an additional 2 bytes, during the backward/training pass only, that accumulate low-order changes, only occasionally carrying over into the primary 8-bit range, which affects the forward/recognition pass.
So, in summary, we have devised several techniques for using and training an ANN classifier that is to be embedded in a higher-level recognition system. Some, such as limited precision weights, are a direct result of physical limitations of the device. Others derive from the fact that an ANN classifier providing class probability estimates to a search engine necessarily has different constraints than does such a classifier operating alone. Despite the seemingly disparate nature of the various techniques we've described, there does seem to be a unifying theme, which is that reducing the effect of a priori biases in the data on network learning significantly improves the system's overall accuracy. Normalization of output error prevents overrepresented nontarget classes from biasing the net against underrepresented target classes. Frequency balancing prevents over-represented target classes from biasing the net against under-represented target classes. And error emphasis prevents over-represented writing styles from biasing the net against under-represented writing styles.
One could even argue that negative training eliminates an absolute bias toward properly segmented characters, and that stroke warping reduces the bias toward those writing styles found in the training data, although these techniques provide wholly new information to the system as well. The general effect may be related to the technique of dividing out priors, as is sometimes done to convert from p(class|input) to p(input|class). In any event, it is clear that paying attention to such biases and taking steps to modulate them represent a vital component of effectively training a neural network serving as a classifier in a maximum-likelihood recognition system. It is also clear that ANN classifiers in conjunction with optimal search strategies provide a degree of accuracy and robustness that is otherwise difficult to obtain.
This work was performed in collaboration with Richard Lyon (Apple), Brandyn Webb (The Future), Bill Stafford (Apple), and Les Vogel (Angel Island Technologies). We are also indebted to many supportive and contributing colleagues at Apple and in the connectionist community. A more detailed, technical discussion of our recognition system is available through my Web page (Larry Yaeger-pen-based character recognition http://www.atg.apple.com/personal/yaeger.
Larry Yaeger is technical lead at Apple Computer in the development of the neural network-based hand-print recognition system used in second-generation Newton PDAs. At Digital Productions, he used a Cray X-MP supercomputer to generate the computer-graphics special effects for Hollywood films The Last Starfighter, 2010, and Labyrinth. While with Alan Kay's Vivarium Program at Apple, he designed and programmed a computer "voice" for Koko the gorilla, and created the PolyWorld artificial-life computational ecology that evolves neural architectures resulting from the mutation and recombination of genetic codes, via behavior-based, sexual reproduction of artificial organisms. Contact him at larryy@apple.com
In Wired, years ago...
Q: How many Newton users does it take to change a light bulb?
A: Foux! There to eat lemons, ore axle soup.
Personally I think its pretty cool to be able to hand write something (typing can be faster but not for everyone) into any application or draw a quick middle finger to your boss in an email (quicker then ascii art - unless of course you have a repository of that sort of thing).
But hey the first uses of the handwriting recognition on OSX have been at the apple stores. I may be incorrect but the pad that you sign your name into for your credit card receipt may be using it. Heard someone at the NY store talking about the fact that the little signature devices were also using the handwriting recognition software to match up against what your credit card's stripe has on it. If it's true, its a nice real world experiment to tweak out the software.
That is your ass, and this over here is your elbow, and NO they ARE NOT the same thing.
I never used a Newton, despite being a big Apple fan. I just never had the money when they were available.
When I hear "from Newton", though, I think of older technology. The Newton may have been great, but it was out a long time ago. Just rolling a Newton technology into the newest version of OS X seems like something I would not get excited about.
So my guess is that it is just a marketing decision.
The other thing (I do not think this) is that there are people that are going to look and equate Newton with "market failure." Once again, the marketing types are nt going to want people to think that about a new technology.
Inkwell may be based on Newton's recognition, but marketing does have some reasons not to make that obvious.
- (c) 2018 Hank Zimmerman
I really hope Apple decides to release something like the old eMate. I have one, and it's great- it's got a nice big screen, and the option to use the keyboard or handwriting recognition for text input. It can even surf the net (with a 28.8 modem, no less)! A portable device running OS X aimed at the education market really seems like a great idea. I know that my school, for one, wouldn't even consider buying any of its students 'real' laptops, but at an affordable price, they might look into these.
Alcohol and Calculus don't mix. Don't drink and derive.
Doonesbury was complaining about HWR in Newton's long before Simpsons. Blame the lag time of TV.
Jesus was all right but his disciples were thick and ordinary. -John Lennon
I once took notes from a fast talking History professor for an hour and a half straight on my Newton 130, without a single error in reading my handwriting.
If tits were wings it'd be flying around.
...instead of trying to screw around with an eight-year-old piece of shit?
WindowsXP CE is the best PDA OS ever!!!
The sad thing is that, today, Apple isn't doing much of that sort of research and development anymore. As far as I can tell, Apple's ATG (Advanced Technology Group) doesn't exist anymore. Most of the people who used to do this kind of research have moved on to other jobs. Microsoft Research is much larger and much more visible in the scientific community than whatever remnants of research may remain at Apple. But Microsoft still produces lousy products despite the large amounts of money they invest in research.
I think in the long run, Apple needs to invest heavily in research anymore or they'll be in trouble. And Microsoft needs to figure out how to take research results and put them into their software more successfully; unlike, say, IBM, Microsoft did not start out as an innovation-driven company, and probably lack the mechanisms for moving research results into products.
...instead of screwing around with ancient devices from two companies?
.net for PocketPC, things will only get better!
PocketPC has been miles ahead of those two jokers for years! And with WindowsXP and
Of course the first thing I'm going to do when I get to play with inkwell is run vi.
There are 10 types of people in this world, those who can count in binary and those who can't.
Insert conspiracies here about Jobs not liking the Newton because it was invented after he left Apple.
___
Cogito cogito, ergo cogito sum.
The original "Based on the Newton's 'Print Recognizer'" text is still on the Inkwell page of Apple's Asia site.
It will probably change soon.
http://www.asia.apple.com/macosx/10.2/inkwell.html
However it looks like the Apple UK site hasn't been updated since MacWorld, so maybe not.
Steve must really hate the Newton...
Steve is disassociating their handwriting software from a system that flopped LONG AGO. Most people don't know what a newton was. Those that do, know it flopped. Never mind the reasons or how great it was - it flopped. End of story.
The only folks that care that it was based on that tech are a few (very few) newton fans. Face it, as a marketing bullet, "newton tech" is at best salt shot.
In the 120, recognition was improved somewhat, but it seemed to really get good with the MP2000/2100. I used printed, rather than cursive, characters and found that it did a very good job.
The data sharing capability between applications on the Newton has also yet to be equalled by any other device I've seen.
Subscribers can see articles in the future? So what? Everyone gets to see them in the future.
...instead of trying to defend a company that realeases products that aren't even fit to be in beta to its customers.
Pocket PCs powered by Microsoft are absolutely the fastest and most stable PDA that has ever been on the market.
I think Apple's customers are owed an apology for the agony of owning an unfinished flop like the Newton.
The Newton didn't flop, it was KILLED.
All Apple buffs know that Stevie never liked 'the scribble pad', it was originially Sculley's wonder child.
When he returned, he killed it, and the world was never the same way again.
The original work was done by a couple of guys (Michael Kaplan and Brandyn Webb) who were in ATG to do pure neural-network research. Handwriting was just something they, ur, tried their hand at, and when they got good initial results they became ATG's handwriting recognition group, and Larry joined around that time. Interestingly, none of the three of them had any notable background in this sort of thing, having all come from 3d graphics backgrounds, and most of their solutions for handwriting were pretty off-the-cuff. To put some perspective on how original the work was, consider that the technology behind it took the cover of the quarterly research rag AI magazine eight years later.
So, really, they practically did single-handedly invent handwriting recognition.
Most of the people who used to do this kind of research have moved on to other jobs.
In this case, Michael and Brandyn went on to create Adobe Atmosphere. It does seem that people like this have been scattered to the wind, though, with places like Apple ATG, Xerox Parc, and such fading from existence. Are there any companies right now with a strongly creative culture, or is that a bygone era?
I'm still using my eMate as a sort of nice phone number and addresses organisers today. When entering a new address I mainly use handwriting and it's still recognised perfectly. It has the 2.0 version of Newton OS which has much better handwriting algorithms than previous versions. One thing I like very much about the way handwriting recognition works are "gestures". Using special gestures one can change spacing between words, insert spacing, erase things, select words. I think this will also be present in Ink.
Pudge, if you're going to turn into a flame warrior, then you should resign as an editor. The two roles are mutually exclusive.
OK, I never owned one... but at every trade show where Apple was exhibiting them I bellied up to the booth and made a very serious effort to see whether the Newton would work for me.
It didn't come close.
It wasn't an issue of missing a word here or there, it basically missed more of them than it got. I decided it was going to be useless to me and never got one.
Your mileage may, of course vary.
At WWDC 1996, I noticed that lots of attendees had Newtons--and that almost without exception they were using them with add-on keyboards.
In contrast, I have very little trouble with the Graffiti system on the Palm. Slow, but perfectly usable.
"How to Do Nothing," kids activities, back in print!
Though two things should be mentioned: early-model Newtons (which is apparently what you encountered) did a better job of recognizing an individual's handwriting if they took the time to "train" it to recognize your individual quirks. And later models did a pretty good job even without training. If they'd waited to perfect the handwriting technology the machine might have done better. Then again, there were other issues -- the ill considered form factor and the gawdawful desktop synchronization being the two biggest.
Another individuality issue. Graffiti is good, but I never did come to terms with it -- I just don't think that way. Finally switched to Fitaly Stamp, which turns the Palm entry area into a keyboard with a proprietary layout. Not for everybody -- some might prefer qwerty or dvorak (though I think Fitaly's layout is very logical for its intended purpose). Others probably get along fine with Graffiti. And still others just don't get the whole PDA thing, and are honestly better off with a Daytimer. It's a matter of what works for you.
Look, it's simple marketing logic. The Newton was a failure in terms of a product line, although it was an extremely cool gadget with amazing technology. Apple doesn't want to associate a new product which they hope will succeed, to an old product that failed.
The last thing that people need to misguidedly think, is that Apple is short on ideas and is having to scrounge through past failures to find new technology ideas.
I think this is a wise decision on their part to give this technology a fresh image, seperate from the ridicule that the early-model Newtons got (i.e. The Simpsons with MessagePad 100, 110, etc), and well deserved.
The fact is, the Newton 2x00 handwriting recognition of 3 years ago is better than anything else on the market today, and I'm sure with some modernization, it'll be positively excellent.
-----
"Cogito Eggo Sum: I think, therefore, waffle."