Scientists and Lawyers Argue For Open US DNA Database
chrb writes "New Scientist has an article questioning the uniqueness of DNA profiles. 41 scientists and lawyers recently published a high-profile Nature article (sub. required) arguing that the FBI should release its complete CODIS database. The request follows research on the already released Arizona state DNA database (a subset of CODIS) which showed a surprisingly large number of matches between the profiles of different individuals, including one between a white man and a black man. The group states that the assumption that a DNA profile represents a unique individual, with only a minuscule probability of a secondary match, has never been independently verified on a large sample of DNA profiles. The new requests follow the FBI's rejection of similar previous requests."
I have been concerned for years about this, because you often hear prosecutors and "expert" witness testimony to the effect that "the odds are billions to one against this being someone else".
Among other possible statistical mistakes, these unrealistically large numbers are based on the idea that each genetic location being compared is statistically independent. But in fact we know that to not be so. What we definitely do not know is how, or how often, many of these may actually depend on each other.
Let me give you a purely hypothetical example: what are the odds that a genetic profile from a random person contains a gene determining curly hair. What are the odds of finding this gene in a random sample?
You can answer this approximately by simply observing what percentage of the population has curly hair. Let's say 1/4 just for argument. So your odds are 1 in 4.
But here's the kicker question: what are the odds that a genetic profile includes a gene for curly hair, given that it also contains a gene for sicle cell anemia?
The odds are going to change drastically.
This is not a real example, of course, just illustrative. But one can easily see that the contents of genetic locations are NOT necessarily statistically independent, even if one of them does not directly cause the other.
We simply do not know enough to say that any two genetic locations are truly independent. Therefore these huge probabilities ("billions to one" for example) being spouted by prosecutors are completely specious.
I am consistently horrified that juries offload their responsibility by blindly applying the judgement of expert witnesses (who are often paid to say the same thing over and over again), whether forensic scientists, psychologists or IT specialists. I take DNA evidence the same way as I take the contents of a third party /var/log: with a pound of salt, because I know it could have been planted.
When I was a juror I was interested in means, motive and opportunity as necessary but not sufficient conditions to vote guilty. I also made use of the defendant's inconsistencies in his testimony, details about the background of the defendant and victim to the extent that it was relevant to his alleged act, consistency of information from eye witnesses around the time of the event, known and unknown, doctors' reports, police officers, etc. I paid little attention to forensic details which might, according to the arguments of a scientist, help /confirm/ the prosecution's case, because I have more than reasonable doubt in my mind of any evidence which requires me to be an expert to interpret correctly - especially when I'm not that expert, instead deferring to some guy I just heard in a courtroom.
Even more so than the issue of statistical independence or veracity of the DNA testing process itself (which SHOULD be investigated) is the simple possibility of corruption, incompetence, or simple mistake. If a DNA testing lab simply accepts a bribe to give their expert testimony, has a mistake and switches sample vials, etc, their expert court-testimonyer will still show up in court claiming "The chances are approximately eighty-three bazillion to one".
This giant number has the emotional effect of certainty, but that number is just the chances that the sample the DNA lab recieved corresponds to the DNA of the accused--IF NO MISTAKES WERE MADE and nobody is planting evidence or accepting bribes. It's not the chance that the accused is innocent. I'm sure this distinction is made in the verbal "fine print" but the jury will still be swayed. The giant odds numbers are nothing powerful emotional hooks. The real possibility that the DNA evidence does not finger the accused breaks down like this:
1:1billion the DNA matches someone else due to a flaw in the statistics of DNA testing
TIMES
1:$smallernumber the DNA lab has accepted a bribe, has a mole, made a mistake, etc
TIMES
1:$smallernumber the DNA lab has honestly received a sample from the accused but the sample was planted at the scene by police, the real criminal, or really bad luck.
The jury won't be considering these factors when they hear the "1:1billion" number. It's nothing but sensationalism.
Scientists already know that the human genome (DNA) is not the complete blueprint for an organism. The human epigenome, which is far more complex, and contains more of the details about how to put those building blocks together, is no less important...and seems likely that it contains more of what separates us as individuals.
I agree with mangu that "DNA tests should be used for finding someone innocent rather than guilty." Paternity tests are done in a similar way even though the general public does not seem to know: genetic microsatellite tests can disprove paternity but not prove if it is in fact the father due to false positives. The question should be how many microsatellite sites (sites that are usually different in the human population) should be analyzed to arrive to a conclusion?
If the purpose is to independently evaluate the rate of false matches in a DNA database to be used in criminal investigations, what better database is there than the one that will be used for that purpose?
Privacy issues can easily be worked around here---there's no need for personally identifiable information (i.e., name or location, not the dna data itself :-P ) to accompany the database for this purpose. You might also worry about statistical independence between the sample to be used for the analysis and that used for testing the results, but there are very well established methods for using subsamples of a data set in just this way.
I wouldn't call it a case of mission creep. Research is needed to confirm that the database is suitable for the purposes it was created for.
These issues were identified as early as 1969, in a landmark HEW report on computer records and the rights of citizens. It boils down to this: inferences drawn from data that affect the lives of people ought to be rationally justifiable. This means not using data until its suitability can be established. Mission creep can lead to data being used outside the context it is reliable in; but we can also run afoul of privacy and due process concerns by collecting data in the first place without establishing it means what he hope it means.
I've been concerned for years about the reasoning used in DNA screening. It entails a long chain of assumptions, and while all the assumptions *seem* plausible, the chance that one or more of them is wrong or has some unknown wrinkle is not negligible.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.