Gene Mappers May Have Missed Half The Genes
Nepre writes: "Forbes.com is running a story about new research that suggests that the Human Genome Project may have missed tens of thousands of genes in the race to map the human genome. This is interesting given the intense competition between commercial and academic research. As my grandmother used to say, "The faster you go, the behinder you get!""
I'm rather concerned by some of the statements these guys made, before we put too much credibility in his findings.
"If the mouse and human genomes were so similar, we would be mice," says Shoemaker.
Well, Mr. Shoemaker, to be quite honest, we're not that far off, evolutionary speaking. We share the same classification as mammals, have hundreds of bodily functions that are nothing short of the same, share very complex behavioral patterns, and study the guys in an attempt to find out how our own brains work (go ask a research neurologist). If the man who says this is the director of anything, we need to push him off of his pedestal and teach him some biology.
"Before you count genes, you really need to define what a gene is," says Daniel Shoemaker.
Basically, it seems like the guy is "trolling". "Nuh uh, Taco!", he's saying. "The Theory of Graviity must be wrong because you mis-spelled gravity!" Really, he's saying that people are wrong, and then saying he's right, and then saying that the criteria he's used to make this sort of judgement doesn't exist in the first place.
No definition for a gene? "A unit of heredity. The unit of genetic function which carries the information for a single polypeptide."
IAB (I'm a bioinformaticist). You're partly correct. Introns (the 'junk' inbetween the exonic regions in DNA and freshly transcribed mRNA) do tend towards non-random sequence. You can use a variety of metrics to make guesses as to where introns and exons begin and end within a gene's coding region based on sequence entropy, on GC/AT frequency, on neural nets or hidden markov models trained on known examples, etc.
These metrics, however, are only useful once one knows something about where a 'gene' starts and ends. The real problem here is that some of the assumptions we've made historically about gene structure has potentially led us astray. Yes, the chromosomes are full of junk DNA but no, it's nothing near random for the most part and is full of 'repetitive' elements (short segments that repeat endlessly, query Genbank for 'ALU Repeat' and see how many sequences you find) that make any sort of pattern matching a tough sell genome-wide. There are also plenty of 'psueudogenes' interspersed throughout the genome, leftovers from a bygone era. It's the question of which of these pseudogenes might actually still BE transcribed that only mRNA expression analysis can provide. Hopkins is definitely on the right track w/ something like SAGE (though it's not exactly high-throughput, hence our man's need for extrapolation to genome-wide numbers).
The paper should be an interesting read to say the least.
-j
An earlier comment hit the nail on the head, I'm quite sure Mr. Shoemaker sold 80,000 genes to a biotech/pharmacutical company, and now has to explain why he doesn't owe them half their money back (what a funny conversation that would be to listen to).
What many people don't want to address that are trying to sell genomics, is that the differences between a mouse and a human are likely not the result of there being more genes in humans, but rather a difference in regulation of (approximaly) the same number of genes. That is to say that there are likely differences in the promoter (on switch) and repressor (off switch) portions of these genes, that cause one to be active in a certain situation in the human, but not in the mouse. A simple analogy demonstrates the difference: you can have two similar cars with similar horsepower, number of tires, gears etc, but if you put an old grandmother in one, and a formula 1 driver in the other, and watched them drive on the highway, you might make the mistake of thinking one car had more power, a larger engine (genes) than the other- when in fact the difference between the two is due to control of the same equipment(gas=promoter and brake=repressor elements of genes). Further analysis of the control regions of genes, as well as differences in protein-protein interations (proteonomics)will likely explain the differences between a human and a mouse, not 50,000 as yet undiscovered genes.
"If we knew what we were doing, it wouldn't be called research, now would it?' -Albert Einstein-