Computer Program Learns Baby Talk in Any Language
athloi writes "Researchers have made a computer program that learns to decode sounds from different languages in the same way that a baby does. The program will help to shed new light on how people learn to talk. It has already raised questions as to how much specific information about language is hard-wired into the brain."
they have only tested with japanese and english. (see ars technica's coverage here). while they do present some intriguing results, the authors themselves admit that their methodology is flawed. btw, when did slashdot become ars redux?
I'm busying myself reading the actual research journal article, and forwarding it to my laboratory colleagues.
It looks interesting. Sorry I can't post the journal article text.. copyright blah blah
Vallabha, GK, & McClelland, JL. (2007). Success and failure of new speech category learning in adulthood: consequences of learned Hebbian attractors in topographic maps. Cognitive, affective & behavioral neuroscience, 7(1), 53-73.
A Good Troll is better than a Bad Human.
IAAL (I am a linguist), and I believe you are correct. Language is a colligation of sound and meaning, but this technology merely distinguishes sounds: it is a vastly simplified model, not of how children acquire language, but of how children pick up phones. The phone is the most basic unit of the physical (sound) aspect of language, so if this technology is to have any use at all, it has a very long way to go.
From TFA:
Expanding on some existing ideas, he and a team of international researchers developed a computer model that resembles the brain processes a baby uses when learning about speech.
This sentence means nothing. How do they know their computer model resembles the brain processes? Because they got the same outcome? Is that enough to verify what goes on in the mind of a child?
How about this: as soon as their program can distinguish allophones, I will be impressed. Allophones are different sounds in a language that native speakers do not distinguish, but which nevertheless occur in certain environments. For instance, in English we do not distinguish the voiced th sound and the voiceless th sound, but we do distinguish f and v, even though the only difference in both pairs is voicing. The difference is that exchanging f and v can change the meaning of a word, but changing voiced th and voiceless th only makes the word sound funny.
Esoteric reference.
You're right, it doesn't seem McClelland et al's paper makes the claims that Reuter's article does. Scientific American's article did a much better job explaining the realities and the SA author appears to have actually understood what McClelland et al were getting at.
Actually, in English, we do distinguish voiced and unvoiced /th/. They aren't allophones at all - unless you think "thigh" and "thy" are the same word, of course.
While "thy" is somewhat archaic it's still part of the language. Voiced and unvoiced is an area where English distinguishes heavily; we're very light on aspiration, mind you.
I don't know. If you need that much exposure to your father (it sounds like you have had some). I personally tend to pick up the mannerisms of anyone I'm around that I have some kind of affinity for. I begin to gesture like them, I know what they would say in certain situations, I begin to respond to certain situations the same way they would. This can happen even if I only met someone once. This includes: facial expressions (squinting, raising eyebrows), voice inflections, laughing, pauses when speaking. I notice it in written text as well.
Because humans are adapted to be good at learning language. That doesn't mean they have to be born having already learned it in their genes somehow.
Ad hominem attacks are a really great way to make a scientific point, by the way.
Win dain a lotica, en vai tu ri silota
Here's an audio clip of its learning progression.
And I recall seeing a TV broadcast showing an experiment where infants were incapable of even hearing certain sounds from one language (e.g. an inuit language with subtle throat-clicking sounds) if they were primarily exposed to another language (say French or English). A baby had to be repeatedly exposed to certain sounds before they could perceive them.
Noam Chomsky will be overjoyed if this thing proves to be a success - because if it does, it will provide no less than a working black-box model of the very firmware in question :).
Something bad is coming when people are suddenly anxious to tell the truth.
Steven Pinker's book, The Language Instinct is a good read for anyone interested in the theory of Universal Grammar. It's written in a fairly accessible style, but there are some tough ideas to get your head around if you're new to the subject. Those who have a Computer Science background and learnt about grammars etc. in their compiler design courses might appreciate reading about the subject from a different angle, I know I did.
I'm not at a place where I can access the research article, so let me comment about what I know about McLelland's previous work with neural networks.
Rumelhart and McLelland worked on the groundbreaking "can a neural network learn how to pronounce words based on their spelling?" paper, which used back-propagation to train a neural net to do just that. That was in the 1980s. (Sejnowski at the Salk Institute followed up with a lot of neural net training studies too.)
Their little cheat was that there was no temporal component to the data. Words were represented as sets of triplet-letters: catalog is represented as "-ca", "cat", "ata", "tal", "alo", and "log". (Actually, I don't remember if they used special sequences to represent start and stop, so --c -ca og- and g-- may not have been part of the sets.}
And of course the neural net didn't really have audio output, though of course the rejoinder is that this would be trivial.
My key question is how they deal with the issue of time in this study, and if there is any actual audio output which would act as feed-back for the training system or whether the output is representational only, as an output set of phonemes.
Having real audio output and real audio input would let it correlate its output with real language examples. Having representational blobs would only mean that: given inputs of the hash that represents "hard TH" vs the hash that represents "soft TH" the system could yield a result of different outputs.
And you're saying that the key result would be if the system learned to conflate or ignore the two sounds of "TH", hard or soft, in trying to interpret words. Remember that the initial Rumelhart-McLelland model was "content/meaning free", and I suspect that this one is too. Learning to conflate "x" and "y" in a neural net would be trivially implemented and trainable: the links for "x" and "y" into the model would have similar weights in the right contexts (the context being the set of predecessor and successor phonemes).
It sounds like an agglomerator: given a large dataset of valid words in a given language, this system learns the rule for "predecessor" and "successor" probabilities of a particular phoneme vs another phoneme and then produces random output with the same Bayesian probability, producing gibberish nonsensical sounds which follow the probability distribution of the input training language.
or that's my guess at least trying to be the typical slashdotter commenting without reading the article.
I'll try to get at the article from the Uni with journal access tomorrow.
Kris
Going a step further, those "words" aren't words in any language.
The formal words are mother and father, though mommy and daddy seem a reasonable informal way of saying my mother and my father. Mom and Dad are derived from the informal. However, kids master the ma and da syllables quickly, so doubling it up and calling it a word makes it easy.
A friend relayed a story to me... someone asked him why his child called him Abba, which he said was the Hebrew word for daddy. The person protested, "but that's the first noise children make." He smiled back, "I know, and that's why we made it the word for daddy." Evolutionarily, this makes sense, mastering dada before mama makes sense as well... mothers are MUCH more wired for unconditional love than fathers, because of the hormonal bonding from delivery and nursing (those that don't do those steps don't get the hormone dump helping them, doesn't affect their being good mothers, but probably makes it rougher on them)...
Each language has a "simplified" informal and a baby equivalent. Hebrew: Father = Av, Mother = Em, My Father = Abi, My Mother = Imi, yet the informal is Abba and Ima, which officially are tied to Aramaic, but probably evolved as simplified forms for children. Like mama and dada, papa, etc.
It would serve a TREMENDOUS biological edge two quickly master words for parents, and therefore a selected characteristic. It's amazing how not upset you get with a terror of a child when they call out your "name."