Human Genome Contaminated With Mycoplasma DNA
KentuckyFC writes "The published human genome is contaminated with DNA sequences from mycoplasma bacteria, according to bioinformatics researchers who blame an epidemic of mycoplasma contamination in molecular biology labs around the world. The researchers say they've also found mycoplasma DNA in two commercially available human DNA chips made by biotech companies for measuring levels of human gene expression. So anybody using these chips to measure human gene expression is also unknowingly measuring mycoplasma gene expression too. The mycoplasma genes are clearly successful in reproducing themselves in silico raising the possibility that we're seeing the beginnings of an entirely new kind of landscape of infection. One option to combat this kind of virtual infection is to protect databases with the genomic version of antivirus software, a kind of virtual immune system. But this in itself could make things worse by triggering an evolutionary arms race that selects genes most capable of beating the safeguards."
But this in itself could make things worse by triggering an evolutionary arms race that selects genes most capable of beating the safeguards.
Why is the word "evolutionary" used here? We're talking about static data that is not "executed" - it does not reproduce, it is only copied verbatim. Invalid data that bypasses filters ("antivirus software") is simply that - corrupt, invalid data that does not belong, but at least there will be less of it after filtering. That doesn't make the data somehow more powerful or adaptive - the filter merely missed it. The key fact is the data does not get to modify itself in an iterative fashion in order to survive or improve.
Better known as 318230.
At first I was relieved that this was a bacteria infecting silicon. Now I'm concerned: When will Avast release an Antibacterial beta? I'm still running Windows, folks! I know I'm vulnerable to this!!!
khasim (12/9/06): In a blind taste test, more people preferred Coke over the Pepsi that I had previously pissed in.
that part is nonsensical:
The mycoplasma genes are clearly successful in reproducing themselves in silico raising the possibility that we're seeing the beginnings of an entirely new kind of landscape of infection. One option to combat this kind of virtual infection is to protect databases with the genomic version of antivirus software, a kind of virtual immune system. But this in itself could make things worse by triggering an evolutionary arms race that selects genes most capable of beating the safeguards.
static data don't evolve
Jehovah be praised, Oracle was not selected
No, they're not saying that the mold itself is appearing in the chips, just that the mold's DNA is. Therefore, the presence (or absence) of the mold in a sample would skew the results when using these chips. And yes, saying that the DNA appears "in silico" is perfectly valid here - whether you care for the term or not.
Experiments done in a cell are in vivo (life); experiments done in a test tube are in vitro (glass); experiments done in a computer are in silico (silicon computer chips). In silico is used to describe computational modeling experiments (think FoldIt or Rosetta@Home), or manipulation and searching of large DNA/RNA/protein sequence databases. You'd expect it to apply to stuff like weather modeling or nuclear physics, but there the analogy to vivo/vitro is lost so I believe those fields don't use the term.
The human genome is surely highly contaminated, just not with mycoplasma. Endogenous retroviruses, retrotransposons, repetitive elements galore, on the other hand...
As a career microbiologist and bioinformatics geek, the complete and utter scientific inaccuracy of this summary made me want to cry.
The mycoplasma genes are clearly successful in reproducing themselves in silico raising the possibility that we're seeing the beginnings of an entirely new kind of landscape of infection. One option to combat this kind of virtual infection is to protect databases with the genomic version of antivirus software, a kind of virtual immune system. But this in itself could make things worse by triggering an evolutionary arms race that selects genes most capable of beating the safeguards
Mycoplasma is a common contaminant of many human cell culture lines. It is often present in low counts, and is a relatively slow growing organism. This is a problem, because many of the immortal cell lines are passed serially, meaning that the mycoplasma propagates right along with it. Most labs that perform cell culture now do routine PCR testing for mycoplasma markers as a quality control measure.
When it comes to sequencing, and in particular, high-throughput next generation sequencing (Illumina/454/SOLiD/PacBio/whatever), you are shotgun sequencing all of the DNA in a given sample extract. This means that if you had a bunch of human cells, that happenned to be contaminated with low counts of mycoplasma, those mycoplasma sequences would be present to some extent in your final sequencing project. Whether this would factor into the final assembly, or just get thrown out depends on the quality control, experience of the bioinformatics team and assembly software pipeline. I am willing to be that most issues with mycoplasma contamination were during the "formative" years of high-throughput sequencing, but may have lingered in databases. These databases would in turn might used by commercial companies that build microarrays or other high-density tools, so it's feasible that some mycoplasma sequence carried over.
Is this relevant? Probably not. On a microarray, it would most likely be wasted space (eg: always negative during gene expression studies... unless the patient had a mycoplasma infection or something). Furthermore, a simple analysis of the sequence would help to rule out sequences that were clearly prokaryotic.
"In silico" does not mean what you think it means. In fact, this whole bit about in-silico replication and arms races is complete and utter nonsense. In-silico biology usually refers to biocomputing. Eg: analyzing, manipulating and simulating gene/protein sequences, expression, signalling cascades, and the like on a computer system. It does not apply to mycoplasma sequences running around all nambly pambly causing infections that would require some sort of anti-virus software. What they might be alluding to is the fact that a lot of shotgun sequencing libraries are run, as needed, through a vector screen, which is designed to pull out irrelevant sequences that may have been necessarily introduced during cloning or sequencing. Plasmids, cosmids, whatever. These algorithms may need better tuning to do a better job of ruling out mycoplasma in human sequences, but there's no danger of these mycoplasma sequencing replicating and taking over the world.
Unless you happen to be William Gibson.
This article was horribly written. They go between using terms with their literal meaning and using terms in metaphorical creative language but do not differentiate between the two using context at all. It's an incredibly confusing read. Actual ancestral human DNA is not contaminated with actual mycoplasma DNA sequences.
Here's what I gather is going on:
Researchers took a sample of human DNA and sequenced it, while doing so the sample was contaminated with DNA from mycoplasma (possibly from bacteria in the lab or on the researchers themselves). While sequencing it, the data is assumed to be a representation of pure human DNA (which would be incorrect). Other researchers then use this data set as a reference to compare other human DNA samples they sequenced themselves. They use this to test gene expression and so forth. So if their DNA samples show gene expression for mycoplasma they would incorrectly think it was normal human gene expression. What they did is use software to strip the mycoplasma DNA data from the original data set (that had both human and mycoplasma DNA sequences) to only use the actual human DNA data as a reference. The biological contamination was first in the original sample that was tested, and then the contamination referred to elsewhere is computational data "contamination." This is the software they are referring to as antivirus software and virtual immune system (which isn't antivirus software or similar to a biological immune system, it's DNA data filtering software).
These people really need to think about what they're trying to say before puking up jargon salad on the readers' brains.
I don't want to be excessively harsh but the summary was seriously a bunch of drivel. In silico either means it's data on the computer, or that you are simulating a biological process computationally. But as other posters have mentioned, unless you are purposely simulating evolution, mycoplasma sequences in your human databases isn't going to cause any "arms race." Yes, it seriously screws with validity, but that's a completely different issue.
This is a generalization, and no offense to fellow Slashdotters, but in my experience most of the computer scientists that I've met have a really crappy understanding of even basic biology. CS concepts don't directly translate to biology ones.
Cogito, ergo sum, fosho!