Scientists and Lawyers Argue For Open US DNA Database
chrb writes "New Scientist has an article questioning the uniqueness of DNA profiles. 41 scientists and lawyers recently published a high-profile Nature article (sub. required) arguing that the FBI should release its complete CODIS database. The request follows research on the already released Arizona state DNA database (a subset of CODIS) which showed a surprisingly large number of matches between the profiles of different individuals, including one between a white man and a black man. The group states that the assumption that a DNA profile represents a unique individual, with only a minuscule probability of a secondary match, has never been independently verified on a large sample of DNA profiles. The new requests follow the FBI's rejection of similar previous requests."
I have been concerned for years about this, because you often hear prosecutors and "expert" witness testimony to the effect that "the odds are billions to one against this being someone else".
Among other possible statistical mistakes, these unrealistically large numbers are based on the idea that each genetic location being compared is statistically independent. But in fact we know that to not be so. What we definitely do not know is how, or how often, many of these may actually depend on each other.
Let me give you a purely hypothetical example: what are the odds that a genetic profile from a random person contains a gene determining curly hair. What are the odds of finding this gene in a random sample?
You can answer this approximately by simply observing what percentage of the population has curly hair. Let's say 1/4 just for argument. So your odds are 1 in 4.
But here's the kicker question: what are the odds that a genetic profile includes a gene for curly hair, given that it also contains a gene for sicle cell anemia?
The odds are going to change drastically.
This is not a real example, of course, just illustrative. But one can easily see that the contents of genetic locations are NOT necessarily statistically independent, even if one of them does not directly cause the other.
We simply do not know enough to say that any two genetic locations are truly independent. Therefore these huge probabilities ("billions to one" for example) being spouted by prosecutors are completely specious.
And that’s far from all. Imagine having a bone-marrow transplant. Now your blood has another DNA than your skin!
I remember reading about a person, who had three different types DNA in his body... at the same time!
DNA can be as easily faked as fingerprints. Hell, I could just “accidentially” cut a big politician, while getting his autograph. And then plant that DNA at a murder site. While I myself am completely sealed off in a virus-lab-style overall.
A overall that suffices will be below 50 bucks an a special store. And an autograph just is some travel costs. Everybody can do it.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
I am consistently horrified that juries offload their responsibility by blindly applying the judgement of expert witnesses (who are often paid to say the same thing over and over again), whether forensic scientists, psychologists or IT specialists. I take DNA evidence the same way as I take the contents of a third party /var/log: with a pound of salt, because I know it could have been planted.
When I was a juror I was interested in means, motive and opportunity as necessary but not sufficient conditions to vote guilty. I also made use of the defendant's inconsistencies in his testimony, details about the background of the defendant and victim to the extent that it was relevant to his alleged act, consistency of information from eye witnesses around the time of the event, known and unknown, doctors' reports, police officers, etc. I paid little attention to forensic details which might, according to the arguments of a scientist, help /confirm/ the prosecution's case, because I have more than reasonable doubt in my mind of any evidence which requires me to be an expert to interpret correctly - especially when I'm not that expert, instead deferring to some guy I just heard in a courtroom.
Are we talking about here? If this is a catalog of DNA of convicted criminals then it might be ok. But if its also DNA samples from other people who gave a sample to clear their name, then I don'yt think it should be made public.
Even more so than the issue of statistical independence or veracity of the DNA testing process itself (which SHOULD be investigated) is the simple possibility of corruption, incompetence, or simple mistake. If a DNA testing lab simply accepts a bribe to give their expert testimony, has a mistake and switches sample vials, etc, their expert court-testimonyer will still show up in court claiming "The chances are approximately eighty-three bazillion to one".
This giant number has the emotional effect of certainty, but that number is just the chances that the sample the DNA lab recieved corresponds to the DNA of the accused--IF NO MISTAKES WERE MADE and nobody is planting evidence or accepting bribes. It's not the chance that the accused is innocent. I'm sure this distinction is made in the verbal "fine print" but the jury will still be swayed. The giant odds numbers are nothing powerful emotional hooks. The real possibility that the DNA evidence does not finger the accused breaks down like this:
1:1billion the DNA matches someone else due to a flaw in the statistics of DNA testing
TIMES
1:$smallernumber the DNA lab has accepted a bribe, has a mole, made a mistake, etc
TIMES
1:$smallernumber the DNA lab has honestly received a sample from the accused but the sample was planted at the scene by police, the real criminal, or really bad luck.
The jury won't be considering these factors when they hear the "1:1billion" number. It's nothing but sensationalism.
Scientists already know that the human genome (DNA) is not the complete blueprint for an organism. The human epigenome, which is far more complex, and contains more of the details about how to put those building blocks together, is no less important...and seems likely that it contains more of what separates us as individuals.
I agree with mangu that "DNA tests should be used for finding someone innocent rather than guilty." Paternity tests are done in a similar way even though the general public does not seem to know: genetic microsatellite tests can disprove paternity but not prove if it is in fact the father due to false positives. The question should be how many microsatellite sites (sites that are usually different in the human population) should be analyzed to arrive to a conclusion?
You do realize the DNA is PII, right?
Or did I just get wooshed?
Assume several thousand matches are found in the database. Defense lawyer will argue odds are in the thousands that the defendent was falsely matched. This is wrong. Much like the puzzle of how many people do you need to have at a party to have two with the same birthday (about 30, I believe). But the odds that two people have the same birthday are about 1 in 365 not 30/365 as would be falsely concluded using the same arguement as above.
Assume odds are 1 in 10,000,000 that two people have the same DNA profile. Then defense lawyers asks expert witness
"How many people would have to be in a stadium before the odds are greater than 50% that two have the same profile?
Witness "About 4400."
Of course the readers of slashdot would be excused from the jury by the defense as they would not fall for this.
So basically you're saying you believe witness testimony is more reliable than scientific evidence?
I simply can't work out how you read that message from my post. I was saying that the corroborating testimony of multiple witnesses is worth a lot, while the /interpretation/ of some scientific test by some guy (for all I know, as a non-forensic scientist) is worth very little.
The jury is not there as an expert in forensic science - indeed, in the United Kingdom, a forensics expert might be exempted from jury duty - and, as such, is not qualified to decide whether he is witnessing "scientific evidence". He must judge the reliability of the interpretation by the expert witness in the same way that you treat all other witnesses. You are making a very dangerous appeal to authority if you decide that, because one witness is announcing himself as having "scientifically verified" his data, he is more reliable.
A doctor is there in his capacity as a reporter of the injuries which he treated. A police officer is there to report on what he saw and what he was told, and we must judge his impartiality, his memory, etc. A forensic scientist employed by the prosecution, moreover, there is therefore no other reason than to say "I am here to assure you that this evidence shows this guy was guilty, and trust me because I'm a scientist". He should be viewed with at least as much doubt as the police officer, if not more, because the police officer does not play the "I know more than you - trust me" card.
If the purpose is to independently evaluate the rate of false matches in a DNA database to be used in criminal investigations, what better database is there than the one that will be used for that purpose?
Privacy issues can easily be worked around here---there's no need for personally identifiable information (i.e., name or location, not the dna data itself :-P ) to accompany the database for this purpose. You might also worry about statistical independence between the sample to be used for the analysis and that used for testing the results, but there are very well established methods for using subsamples of a data set in just this way.
I wouldn't call it a case of mission creep. Research is needed to confirm that the database is suitable for the purposes it was created for.
These issues were identified as early as 1969, in a landmark HEW report on computer records and the rights of citizens. It boils down to this: inferences drawn from data that affect the lives of people ought to be rationally justifiable. This means not using data until its suitability can be established. Mission creep can lead to data being used outside the context it is reliable in; but we can also run afoul of privacy and due process concerns by collecting data in the first place without establishing it means what he hope it means.
I've been concerned for years about the reasoning used in DNA screening. It entails a long chain of assumptions, and while all the assumptions *seem* plausible, the chance that one or more of them is wrong or has some unknown wrinkle is not negligible.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Sorry, I meant “accidentially cut by the sheet of paper you hand him”.
But you are right, there are easier ways. It was just what I came up with first. :)
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Never mind that the forensic findings are (or ought to be) independently verifiable.
All worthwhile eyewitness accounts are independently verifiable, i.e. involve independent eyewitnesses. Don't let a commendable scientific spirit enter a pathologically obsessive state where you're happy to take a report of a complex scientific procedure as close to infallible but won't accept a dozen people in a park telling you that the grass is green and the snow is white because "eyewitnesses are notoriously unreliable" and "the grass and snow weren't observed under scientific conditions". Such disconnect has been parodied since Aristophanes, and for good reason.