The Science of Word Recognition
neile writes "I stumbled across a fascinating paper over at the Microsoft Typography site today that provides a really nice overview of the different theories on how humans read. If you thought we read by recognizing word shapes, think again! With the assistance of fancy eye-tracking cameras researchers have been able to devise several clever experiments to give us new insight into how reading works." We've linked to some of Larson's work previously.
Would one of those stupid comments about the colour scheme on /. be on-topic now?
Wh47 d1d j00 541, 31337 15n't t3h r0xor5 ne m0r3???
I was reading what was written on her T-shirt!
So are Microsoft going to patent the way we read and then sue?
"If you are reading this then you owe Microsoft royalies"
----
New technology will soon be revealed that will instruct Slashdot users on the proper spelling of "lose".
The USSGN (Union of Slashdot Spelling and Grammar Nazis) is expected to stage protests against the new product in the interest of keeping their jobs.
It would be cool if it didn't suck.
With the assistance of fancy eye-tracking cameras researchers have been able to devise several clever experiments to give us new insight into how reading works."
Oh they must have been using EyeQ....
I can read at 44692 words per minute! Thanks for posting that long article for me to read, I needed the exercise.
And thank you EyeQ! Your the greatest!
Really though, they say that the more letters/words mean faster reading times. It's true. Think about a book or article you've read. When the words are together on the page it's easier to read because your eyes can jump around letting your brain fill in the blanks.
Ever read something that made sense but you couldn't quote it word for word? It's likely because you read in this same way.
Get your Unix fortune now!
"Evidence from the last 20 years of work in cognitive psychology indicates that we use the letters within a word to recognize a word."
Man, I'm so glad they finally figured this out...
Does anyone else think that merely analyzing how english is read is very closed minded? I'm pretty sure only a very small percentage of the world speaks and reads english.
I would love to see a study comparing how english is read to how chinese is read by native speakers. Very interesting i would gather.
A Fatal OE Exception has occurred, Sig will now reboot.
While reading the article, I suddenly become hyper-aware about how I was reading the article. :-)
Don't let the Microsoft name scare you off - the article makes for a fascinating look (pun intended) into how we read. I wonder, though, if these findings are duplicated with written Oriental languages.
You call this a signature?
Since most people in the world don't use the latin alphabet, it would be interesting to find out how word recognition works for them. And how they read words in our alphabet.
***Quis custodiet ipsos custodes***
The final conclusions are similar to what I learned in my college linguistics classes 15 years ago. Language contains a lot of redundancy. The reason is that we often encounter situations of so-called "reduced redundancy". For example, someone might have sloppy handwriting so you can't make out all of the letters. Or you might be talking to someone while they brush their teeth. If language were highly optimized, we wouldn't understand a thing in these situations, but because of redundancy we can usually communicate very effectively.
The same applies to reading. The conclusions of the paper seem trivial to me. Of course, reading exploits "visual" and "contextual" information. How else would be understand a sentence like "The boy ate a ham___er" (with a few letters obscured)?
The fact that the brain's neural net adds up the weighted lexicographic, syntactic, semantic (and even pragmatic) information available to it in order to interpret language should be familiar to anyone who's read Goedel, Escher, Bach. And that was published in 1979...
Peer Pressure
when are they going to repeat these experiments in let say China or Japan? I'm *very* interested in what would the conclusions be there. ...
For what i know abaout japanese, they don't use spaces between 'words'. A single kanji represents the whole word and their outline is always more or less square. So the whole bouma theory fails here, as he finds out.
I'm sure they could leard more interesting things in other writing sysmtems
Research shows that
I found myself becoming aware of how I read while I read. Fun! I agree with the author regarding letter recognition. The parallel aspect of word recognition is very interesting as well because it begins to explain why we are albe ot raed srcambled txet os eaisly!
Also, more work needs to be done to consider the visual cues outside the focus of attention. It is here that, I believe, shape and form cue the reader, more than letter shapes do, as to the potential content of the text to come. (Exactly how is for the geniuses.)
Blogging because I can...
While some of the results here are interesting (but old), the fact that the entire study focuses on exactly 1 script and 1 language basically renders the conclusions worthless (as conclusions about cognition in general... I suppose they still have value as conclusions about English and the Latin script).
What has happened here is:
1 -- Observe people reading a given language/script
2 -- See how they make use of features of that particular language/script, such as tall letters, case, and the occurrence of 'skippable' words such as articles
3 -- Describe the way they use these local features, and call that a theory of reading in general.
I don't really understand how to apply a theory of reading based on word and letter shapes when there are so many people reading text in which:
--There are no letter boundaries, and/or
--There are no word boundaries, and/or
--Letters all have the same form factor
The experiments described would probably generalize very well to arabic and greek scripts, pretty well to cyrillic (no tall/short letters to speak of), badly to devanagari-type scripts, very badly to Chinese and Japanese, and not at all to hieroglyphics (though I agree that there may never have been a reader of hieroglyphics who was fluent by modern standards).
To pretend that these experiments apply to humanity in general rather than the author's own language/script choice is silly. It's an interesting article and I'm glad the research was done but unfortunately a certain failure to 'get' the multilingual nature of humanity, which I don't really expect to find in MS work, is in evidence here.
Whence? Hence. Whither? Thither.
The example:
Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.
But soon enough there was a counter example:
Anidroccg to crad cniyrrag lcitsiugnis planoissefors at an uemannd, utisreviny in Bsitirh Cibmuloa, and crartnoy to the duoibus cmials of the ueticnd rcraeseh, a slpmie, macinahcel ioisrevnn of ianretnl cretcarahs araepps sneiciffut to csufnoe the eadyrevy oekoolnr.
In the counter example, the letters are not randomly scrabled, the letters are in reverse order, except the first and last letters.
>renerding on firefox
re-nerding! ha ha. Best... typo... ever...
They will never know the simple pleasure of a monkey knife fight
If there's one real take-home lesson of brain-design from cognitive science, it's that the brain tends to do everything several different ways in parallel, and then use the results from all of them.
Obviously it can't all be shape, there are plenty of words with identical shapes and yet these are distinguishable.
But it could certainly be true that we use shape and parallel letter recognition at the same time. Shape narrows the field of possibilities from millions to a small handful, and then parallel recognition chooses one of the options.
Whatever happens, you can be sure it's terribly complicated, extremely robust and very efficient.
From the article: ...lowercase text is read faster than uppercase text.
This could also explain why nobody likes to read email where the other person uses all caps.
dunno, firefox / moz has one of my favourite features
tools
great for annoying "web site designers" who can't design for shit
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
The FArticle does, in fact, address this, though not directly - it puts forth a theory that all letters in a word are absorbed simultaneously, and the brain re-orders them. This is given as theory #3, admittedly a ways down.
This gets me thinking, though, about the importance of context. If you drew the letters PLEORBM in a Scrabble game, it might take a while to see the word staring at you. But in the context of a (mangled) sentence: "you can sitll raed tish wouthit a pleorbm," it much more easily jumps out. Interesting.
Infants of English-speaking parents easily grasp the Korean distinction between a cylinder fitting loosely or tightly into a container. In other words, children come into the world with the ability to describe what's on their young minds in English, Korean, or any other language. But differences in niceties of thought not reflected in a language go unspoken when they get older.
Absolutely. And adults can "relearn" those distinctions, too; I found that as my Japanese studies progressed (started at 19, pretty close to native now) the range of things I was able to think about expanded considerably--so much so that now I sometimes have trouble speaking to people in English because English doesn't have a word for the concept I'm thinking about.
If there's those that have shied away from Microsoft, well because they're Microsoft, you might not be aware of http://research.microsoft.com which regardless of which side of various fences you might sit has some very interesting material and is generally worth tracking over time.
Aplogise for the tangent, on the back of this article seemed an apt place to point to the MS research site for those that might not of been aware of it.
I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg - the phaonmneal pweor of the hmuan mnid. Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer inwaht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.
Actuallythesplittingintowordsisnotnecessarytounder standwhatiswritteniftheorderoflettersiscorrect.Thi s"proves"thatyouarereadingbytheletter,notbytheword .(relyingonslashcodetoinsertameaninglessspaceevery nowandthen:-))
The Tao of math: The numbers you can count are not the real numbers.
As a non-native (but fluent) speaker of English, and the husband of a fluent English speaker learning Danish, I can tell you quite well that there are many concepts that have a single word describing them in Danish but not in English, and vice-versa. Some words are normally considered equivalent but have slightly different extents ("pink" covers more colors than the common translation "lyserød", for instance).
The grandparent also didn't say "couldn't be expressed", but "has no word". Given enough verbiage, you can (probably) express any word in one language in any other language, but that's not what you want to do in conversation.
And if the "language of Shakespeare" is so all-encompassing, why has English since then been stealing words from other languages like a slum rat during a riot in a shopping mall? Mind you, I think this is a good feature that adds expressiveness to the language, but it clearly shows that there are things that English speakers consider important enough to be able to express succinctly that they'll bring in foreign words for it.
-Lars
---
I am sure that we've seen this e-mail floating around. Doesn't it seem like we read in shapes?
I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdgnieg The phaonmneal pweor of the hmuan mnid Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer inwaht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Amzanig huh? yaeh and I awlyas thought slpeling was ipmorantt!
It makes a big difference if your messed up words use common letter patterns (what, in the article he called 'Psuedowords'), or not.
Example:
'uesdnatnrd' wasn't to hard to recognize beacuase 'uesd' and 'tnrd' aren't letter patterns that exist in real words. So the mind works quicker to rearrange the letters to find a real word.
'aulaclty' was much harder because it's almost pronouncable. 'lac' and 'lty' are common patterns from real words, and 'aul' might not be common but it's pronouncable.
Just an observation.
Aw crap, ninjas!
Don't give any ideas to spammers on how to sneak their "pneis elnraegemnt ceram" past the filters. I do suspect that the effect is local to the small group of letters and long words that are totally randomized will be difficult to read.
I'm no linguist (elec eng w/ neural net studies), but I would argue that the ability to perceive concatenated sentences like that is a function of the ability of the brain/eye to focus on a particular range and filter out "distractions" (letters to the left and right). Padding our words with spaces helps the brain to quicker define the focus boundaries, after which we can process the text range for meaning...
I imagine the brain's focus as little perception boxes, scanning up and down the concatenated sentence until enough symbols are aligned to fire a recognition signal... As I read your post above, I find my eyes darting about a little more, actually darting to the center of the "word" once recognition is made.
runonsentencewithlowercase -- here's your letter by letter scan "mode"
runonsentencewithcoloring -- slightly easier to define word boundaries by color
runonSENTENCEwithuppercase -- it's easier to locate the word SENTENCE because we perceive a boundary beween small letters and upper letters.
runo nsente ncewit hbads pacing -- pain in the ass, but we still comprehend
run on sentence with lowercase -- whitespace speeds compehension.