FBI Fights Testing For False DNA Matches
Statesman writes "The Los Angeles Times reports that an Arizona crime lab technician found two felons with remarkably similar genetic profiles, so similar that they would ordinarily be accepted in court as a match, but one felon was black and the other white. The FBI estimated the odds of unrelated people sharing those genetic markers to be as remote as 1 in 113 billion. Dozens of similar matches have been found, and these findings raise questions about the accuracy of the FBI's DNA statistics. Scientists and legal experts want to test the accuracy of official statistics using the nearly 6 million profiles in CODIS, the national system that includes most state and local databases. The FBI has tried to block distribution of the Arizona results and is blocking people from performing similar searches using CODIS. A legal fight is brewing over whether the nation's genetic databases ought to be opened to wider scrutiny. At stake is the credibility of the odds often cited in DNA cases, which can suggest an all but certain link between a suspect and a crime scene."
I believe the problem with this is outlined in this article: http://en.wikipedia.org/wiki/Prosecutor's_fallacy
Have a read. It shows that you can't trust statistics when you only have half of the picture, and why it can be so dangerous to do so.
The FBI says that the chance of any given person matching another unrelated person is 1 in 113 billion. They claim that the reason the Arizona lab tech found as many matches as she did ("dozens") is because she was checking the whole database (6 million entries) against itself. This is a straightforward birthday paradox issue, then. According to the Wikipedia birthday problem page, the number of collisions expected given d= 113 billion different "birthdays" and n = 6 million "people in the room" is n - d + d((d-1/d)^n). This is about 160 matches! So in fact the FBI may be right. Note that the chance of a given person matching _anyone_ in the database is about 0.0053%, which is much greater than 1 in 113 billion.
If I'm not mistaken, what you've described is the Birthday Paradox:
http://en.wikipedia.org/wiki/Birthday_paradox/
DNA and fingerprints are useful in conjunction with other evidence, just as any other type of forensic or circumstantial evidence is. If someone passed out and died during a movie from a blow dart, for example, it would not be prudent to arrest a random person if they had tickets to the movie; similarly, if there is a particular DNA profile on the dart that 10,000 people match, those 10K should not be brought in. However, if someone had tickets to the movie && matched the DNA it would probably be a good idea to bring them in. CSI type shows are partly to blame - the average citizen on the jury trusts scientific evidence on its own far too much instead of the old detective story trio of means, motive and opportunity (forensics cannot help at all with the second).
--- You shall know the truth, and the truth shall make you mad- Neal (not Cowboy) Boortz
Hmmmm...
At midyear 2007 there were 4,618 black male sentenced prisoners per 100,000 black males in the United States, compared to 1,747 Hispanic male sentenced prisoners per 100,000 Hispanic males and 773 white male sentenced prisoners per 100,000 white males.
Almost 6 to 1. And how many people will use these numbers to justify their racist attitudes instead of realizing who's being targeted? An economic breakdown might be even more revealing.
What?
Possibly significant in terms of sample size.
If person A has a DNA profile that matches one other person in the country, it is still very strong evidence.
If upon checking the other states there was found to be an average of one matching person per state, 50 matches, still strong evidence, but not nearly so conclusive. Would now require stronger supporting evidence to be "beyond reasonable doubt".
If (prison population being approx 1%) there are found to be 100 matches per state, 5000 matches, then DNA becomes more useful as evidence for aquittal than for conviction, ie: non-matching still proves it wasn't you but matching doesn't prove it was you.
http://marriedmansexlife.com/
Glad to see mention of the birthday paradox, it illustrates the issue nicely. I worked on a genetic mark recapture program that encountered just this effect. Initially things looked great but as the sample size increased we started encountering "shadows" (individuals that share markers at all loci sampled but aren't true matches) with greater frequency. To study large populations you need markers with significantly lower probability of identity than has been assumed in a lot of research. We often remarked how rediculous the statistics quoted by journalists and in court are.
Try a geographic breakdown. Here's a hint: it correlates *strongly* with population density. A very disproportionately high percentage of the crime occurs in the urban areas. Something like 90% of the crime, and 99% of violent crime, in the big urban areas that house about 40% of the population.
Cut that out, or I will ship you to Norilsk in a box.
. That is to say, the chance of having marker A might be 1% and the chance of having marker B might be 5%, but the chance of having BOTH might very well be higher (or lower) than .05%.
IANAFG (I am not a forensic geneticist) but the co-segregation of genetic markers is such a fundamental and well understood process that I would have a hard time believing that they wouldn't know and correct for the rates of their chosen set when calculating the probabilities of a matched set.
Of course the statistics they calculate are probably based on estimates of pairwise segregation. Some higher-order effects may be at work that change the statistics relative to a basic model like independent pairwise segregation.
For example, allele A of gene 1 and allele B of gene 2 may not segregate according to a previously measured pairwise stastistic in the presence of allele C of gene 3. Such higher-order effects may have a significant impact on the statistics but would require a *lot* of data to reveal.
Just callin' it like I see it.
>>There is a big difference between telling a lay jury "this match had a one in a 113 billion chance of occurring at random" versus "this is an event that occurs randomly on a routine basis." Non-statisticians have a hard time getting their head around the concept of correction for multiple hypothesis testing.
To give an apocryphal quote by Mark Twain: "People use statistics the same way drunks use lampposts - for support, not illumination."
The lack of ability to reason statistically is extremely common in America. I mean extremely common - even in grad students publishing papers on stats, or in the technologically literate crowd. I'd used to write examples of egregiously bad stats in my livejournal in papers and news reports, but gave up because it was so common.
The DNA testing example is actually an example we studied in the Bayseian/conditional chapter of my stats textbook. It described an actual court case in LA where I got was convicted solely by DNA evidence (there was no other evidence to convict him, and he wasn't lucky enough to have an alibi) because the prosecutor confused the odds that (in this case) the odds of the match randomly matching being only one-in-a-million, and those are some pretty powerful odds. Of course, that would mean that in LA alone, there would be 6 people (on average) matching the DNA, and so the chance of the guy being guilty is actually only 1/6 or so.
The problem I have with the DNA "this has a one in 113 billion chance of matching" is that this is an extrapolated number based on certain premises of independence between the different loci. Whereas the more we learn about DNA, the more we learn that there is a high degree of covariability, certainly enough that (as the article shows), the odds of a match are actually much much higher.
I don't think they are even using markers. I thought they were using a process that basically duplicates the DNA a massive number of times, then use gravity vs. capillary action to weigh the different chromosomes which may or may not have been through a blender 1st. They are not comparing gigabits of data to verify a DNA match.
As I've said time and time again. Forensic science is a scam. Second rate statisticians and second rate politicians team up with second rate scientists and second rate TV shows to convince the public that forensic superheroes can detect evidence of any evil crime you commit. It's just a way to keep the people under control.
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.