Computers May Be As Good As (Or Better Than) Human Biocurators
Shipud writes "Sequencing the genome of an organism is not the end of a discovery process; rather, it is a beginning. It's the equivalent of discovering a book whose words (genes) are there, but their meaning is yet unknown. Biocurators are the people who annotate genes — find out what they do — through literature search and the supervised use of computational techniques. A recent study published in PLoS Computational Biology shows that biocurators probably perform no better than fully automated computational methods used to annotate genes. It is not clear whether this is because the software is of high quality, or both curators and software need to improve their performance. The author of this blog post uses the concept of the uncanny valley to explain this recent discovery and what it means to both life science and artificial intelligence."
Still not as good at getting first posts, though.
"curator mid-14c., from L. curator "overseer, manager, guardian," agent noun from curatus, pp. of curare (see cure). Originally of minors, lunatics, etc.; meaning "officer in charge of a museum, library, etc." is from 1660s." so, "life + manager" or "life + officer in charge of a library" ...nah.
'geneannonator' ....maybe
This isn't at all surprising if you understand the concept of machine learning and have ever tried to do anything remotely similar by hand.
In fact rather the opposite - it says that the reliability of the machines is 'competitive' or 'rivals' the human curators. That's marketing speak for 'not quite as good just yet'.
This author seems to have inappropriately compared the "fear" of machines doing better than humans with concept of uncanny valley.
The concept of the "uncanny valley" is that the affinity of humans for observing the appearance or behavior of a human-like entity (robot, alien, whatever) has this unexpected dip when it is too close to the human behavior (we have this apparent built-in viceral problem with the entity). However, this is only true when it is trying to mimic human-like behaviors. If it's doing something totally different or totally exceeding human behaviors (say distinctly non-human speed, accuracy, strength, appearance, etc), the uncanny valley doesn't say anything about affinity, in fact, if you were to extrapolate the curve out, humans might even have more affinity for these "super-human" behaviors. Maybe that's why many express affinity for live-action versions of comic book super-heros, or airbrushed models in magazines. The behavior is so far from the uncanny valley that it doesn't invoke the supression response that is responsible for it.
Just like what was once observed with "space-shuttle" pilots, the computers can probably do a better job at this task, but we don't quite trust them yet (for some reason). That's really just the human fear of being replaced by machines, not uncanny valley. Note that the only people fearful about this behavior are the people that are likely to be replaced (and maybe a few that sympathize with them)...
There are a few people in the world, who work for INSD members like NCBI or EMBL, whose job is "biocuration". It's a rare profession. Having reliable annotations available does not equal to discovering a book. In a car analogy, genes are a list of parts. You know things about the car, but how it works and comes together is up to human ingenuity. In the bioinfo/molbio field that usually means heavy use of OSS and shell coupled with in vitro experiments.
"This author seems to have inappropriately compared the "fear" of machines doing better than humans with concept of uncanny valley."
That's not how I read that, although I may be wrong. It seems like both are performing on a par (more or less), yet the higher sensitivity ("coverage") of the automated methods makes them not-quite human-like, but not "better". So the "uncanny valley" here addresses the observation that the programs are performing like humans, but differently. And a bit weirdly close.
Biocurators are the people who annotate genes — find out what they do — through literature search and the supervised use of computational techniques.
Biocuration means that? I'd have never guessed from the name. Let's face it, literature searching is now something that is thoroughly practical by computer (it's pretty much just like using a web search engine, except over a different digitized corpora) and "supervised use of computational techniques" there makes it sound like they're a bunch of low-level lab technicians. No creativity required at all. Is it any wonder they're being replaced with little more than a shell script? What's more, the computer will be far faster as well. It won't get tired, it won't get bored, it'll just do exactly what it's been told to do. (The annotation of a genome with the consequences of the mutations it has should be trivial; I know this from having worked with code that did a whole genome's worth in well under an hour. Several years ago.)
Now if instead they were curating the actual samples, I'd have much more respect. Those can be quite tricky to work with, and they're often irreplaceable.
"Little does he know, but there is no 'I' in 'Idiot'!"
The fear of the "uncanny valley" is that the android, cybernaut, etc.is sufficiently "human" to accidently invoke the fear of the psychopath, i.e. a person without empathy, who is therefore very,very, dangerous.
Why not find a way to leverage the advantage of each?
Table-ized A.I.
Most (all?) computational methods for protein annotation rely on a reliable corpus made by humans, and try to find similarities to guess the result.
Saying that computers are better than humans is like saying turbos are more powerful than engines.
What this really means is that we know so very little about how genomes came into existence and how they organize themselves. Algorithms, after all, only optimize or make efficient what factors humans feel are important, yet this is often done with as yet with little understanding of what the rules the natural self-organizing systems use or even if there are many rules at all. The ontologies that are the final outcome of such curation are themselves only theories or models of what is actually going on within genomes. Whatever works may be the only rule constrained by the fact that whatever rules exist must ultimately be expressed in the form of nucleotide sequences. Consequently, it should be of no particular surprise that machines and humans behave similarly when it comes to understanding what this means.
As with all science, humans will use tools. In this case algorithms to aide in developing a better understanding. To achieve that understanding with respect to genomes means elucidating how such sequences and subsequences originated and evolved and have been constrained by selection and influenced by mutation, genetic drift, assortive mating, and other processes that influence which nucleotides have ultimately become "locked into" genomes through geological time.
Systematic biology is not rocket science. It is far more complicated than rocket science since the number of possible permutations and combinations of objects (exterior products of potential events) , many unique, that must be investigated is much, much larger than the known number of electrons in the known universe. Understanding biology, not space is truly the final frontier.
Unfortunately, for humans we seem hell bent on making ourselves go extinct before we have time to figure it all out. It is both perhaps ironic and fitting that humans shall soon go extinct as a species so soon after our first baby step to reach interstellar space has only just been achieved.