British DNA Database Mismatch
nahal writes "DNA evidence is extremely compelling to a jury at trial when trying to convict a suspect. In this article at USA Today, the world's largest DNA crime-solving machine, located in Great Britain, mistakenly matched a suspect to a crime in a 1-in-37 million chance. American experts have called it 'mind blowing'."
This system can only be relied upon to "prove" guilt where every loci is tested.
Even if all match, there is a very tiny but non-zero probability that the match is a false positive. The question then is how much doubt constitutes reasonable doubt? (Or the equivilant phrase in non U.S. courts).
... When prosecutors abuse scientific evidence with pseudoscience. DNA evidence is exclusionary in nature, not inclusionary. In other words (assuming no procedural errors etc) no match = didn't do it, match = COULD have done it. Of course, prosecutors would have the jury believe the opposite. If science is to be used to convict, then scientific thinking MUST be involved if there is to be fairness. No proper scientist would consider a DNA match on 6, 10, or 16 loci as conclusive (but would consider it a VERY strong reason to investigate further).
Consider the 1 in 37 million. If the database were complete for the world population (about 6 billion), that means that on average, any given DNA sample would appear to match 162 people. The 16 locus test that the FBI uses is better, but still is not damning in and of itself.
Now, add in procedural error and other bad thinking and you have (to me) reasonable doubt unless there is some other evidence.
I am certainly not against convicting criminals, but I AM against decieving juries into believing that a DNA match is damning evidence. Matching DNA evidence should be regarded as the beginning of an investigation, not the end.
I seem to remember watching a programme on the TV about this (can't remember which one, it was along the lines of an investigative news magazine programme, probably after 9.30, probably on the beeb, I think), and it was at least 4 - 5 years ago. I know it was in that time frame because emmigrated 3.5 years ago.
Anyway, they made a claim that the current DNA testing at that time was flawed and often made matches that were incorrect, flying in the face of the astronimcal odds. I think that there were two stages to the problem, one was cross-contamination, and two, the cloning process that makes the sample big enough for testing cloned the contaminating DNA too.
Perhaps the labs were using the same containers for both the evidence DNA and the sample DNA without proper cleaning between tests? It only takes one fragment of DNA to screw the whole thing up. I think that there was serious concern about the use of cost-cutting independent labs who were bidding to do this work for the police at the lowest possible rate.
They used their tests improperly, and they call it mind-blowing?
Look. It's a 1:37-million chance if you're comparing one person's DNA to one sample (probably found at the crime scene) That's why you only use DNA testing to weed people who couldn't possibly have been involved from a very narrow range of subjects. You can't pick out one suspect from a huge list.
This is the problem with archiving everyone's DNA. You know it'll be used for stuff like this, because law enforcement will get lazy.
DNA testing is a Good Thing. It's a very safe, reliable way to identify suspects. But only if you use it properly. This is hardly a "proper" use of the tests, and I'm not at all surprised that this happened. It's a case of lazy law enforcement more than faulty testing.
This is a very important point and should be moderated up.
It makes the utmost difference whether the police have a suspect and then use DNA matching to see if he did the crime or if they use DNA matching to find a suspect. As this poster mentions it is then a much lower probability that you did in fact commit the crime.
It is exactly the same as disease testing. If you have a large population which is uninfected (not guilty) a positive match even from a very reliable test is highly likely to in fact be an error.
Of course if you up the test to some obscene number of points you can probably make the probality of error very small again. Of course this leaves the scary possibility that people are falsely convicted because they left a hair lying around...but their are always false convictions.
Marriage is the "pseudo-ethics" that cloaks the messy truth of sexuality in the raiment of propriety -- it's "Don't Ask,
Maybe I did'nt express myself properly ... I'm talking about a random DNA sample matching a sample in the database (assuming that those are unique). In that case the likeklihood of a false positive reaches 1 when the database has 37 million entries.
Everyone and their dog has shown that /.ers actually understand basic statistics. With a 1/37 million chance of a match between two people, and 660,000 people in the database, the odds of eventually coming up with a false positive eventually become quite high.
What not so many have pointed out is that the true odds are probably lower than 1/37 million. That figure is based on the contents of each loci being independently distributed. (With about 1/18 of a match at each loci.) Well we know that is strictly not true - after all a sibling of yours will have 1/64 of getting the same loci from the same source that you did. But are there any larger effects?
The answer is that there is. Suppose that some of the loci have a different distributions in frequency between anglo-saxons, Celts, and East Indians. Then the chance of finding a match between 2 East Indians could be far higher than they estimate. For instance if that 1/18 figure was changed to around 1/9, the chance of matching 2 East Indians now becomes about 1/530,000. Even if your database has only 50,000 East Indians in it, if an East Indian committed the crime, the chance of a false positive is around 10%. Much higher than you would expect. (I am using East Indians because I understand that they are a disliked racial minority in England. Substitute your favorite group if you wish.)
So the moral of the story? Not only is the technique going to inevitably produce false positives, but it is likely to do so in a racially biased manner!
Regards,
Ben
My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht
I don't know where you get YOUR math from but it's not relevant.
...
1: The chance of a DNA match (in this 6-loci case) is 1 to 37000000.
2: That means that ONE DNA-sample compared to ONE other DNA-sample has the chance to in 1 of 37000000.
3: If You have TWO other DNA-samples to match against you have a chance of match in 2 (TWO!) of 37000000 !
4: Any other circumstances have no impact on this if THEY HAVE NOTHING TO DO WITH THE DNA-CODE !
5: In this case we have 660000 OTHER DNA-samples to match against ! The rest is obvious
Thomas Berg
Mundus Vult Decipi
Mundus Vult Decipi
Is this sort of false match inevitable when you are comparing large numbers of DNA fingerprints from unsolved crimes to a large database of DNA samples?
Mea navis aericumbens anguillis abundat
Your points are all valid. However, none of this means that DNA testing is not extremely valuable. In fact, no matter how you figure the odds (within reason). the odds of a person who is WRONGLY suspected being cleared quickly by DNA are likely higher than they are of a person being wrong convicted based on DNA. Knowing what we know through years of experience with DNA, we know that the odds of a false positive are still very slim, even if you factor in human error. If the odds of being wrongly conficted of a crime are a mere one in 36m (or whatever figure you might happen to quote), and DNA proves to be usefull in solving a great many crimes, the question you should also ask is can we afford not to use it? Think of how many people have been cleared by DNA. Think of how many murderers have been convincted and/or arrested before they could kill again. Do you honestly believe that the number of people wrongly convincted (based on DNA) exceeds (or even remotely approaches) the number of people who've been saved? How many people have been wrongly convicted based on DNA? The closest thing, to my knowledge, is this ONE (out of how many million?) guy in the article here, and he was not even convicted. I suspect any lawyer worth his weight could have refuted that, especially if the odds (based on the agreed premises) are as high as most slashdoter's have just purported (e.g., 1:56).
Sure, all things being equal, I would prefer there be no chance of anyone being wrongly convicted; however, the fact of the matter is that we don't live in a perfect world. We were no better off before DNA testing. All we've ever been able to gaurantee in the courts is due process. There has always been (and likely always will be, to some degree) human error and prejudice involved in any trial. DNA, despite its flaws, brings us that much further away from those kinds of errors...
I'm sorry you felt a need to take such a strong tone in your title
The "1 in 37,000,000" figure is presented as a final probablility of a match. Where did you see *anything* about there only being 37,000,000 possible permutations?
If there were only 37M permutations of 6 loci, that would imply roughly 20 discrete possible values at each loci. Is that how you envisioned the underlying data?
I don't know what test they use in the UK, but I'm assuming that it's the RFLP -- basically they use a highly specific enzyme to chop up the DNA, and place it on a polyacrylamide gel under an electric current/field to measure the size of the fragments. (Actually, nowadays, they probably use pre-synthesized n-nucleoside primers and PCR [polymerase chain reaction] to chop and selectively amplify the fragments, but the principle is the same)
A single gel can easily measure fragments ranging from a few hundred base pairs to 10-400+ kbp with good resolution. The exact range varies according to current/field, gel composition, and other factors, but the bottom line is: it's easy to see bands that are a millimeter apart, so if you use a foot long gel, the range of possible values is close to 300. that creates:
300^6= 7.29 x 10^14 possible permutations
Actually, 0.5mm is a more realistic resolution limit, so the actual number of resolvable values is at least 600.
(600 values) ^ (6 loci) =4.6x10^16 permutations
These are just crude estimates, for the benefit of those who've never read a electrophoresis gel. In actuality, the range of allowable values might be limited by other factors (values that are too extreme may be eliminated as artifacts) But it does give a sense of the TRUE numbers involved.
(with modern gels and automated readers, the resolution may be even higher, but my experience was with UV lamps, eyeballs and Polaroid prints way back in the 1900's... 1991 or so)
Please run your analysis again using this range of possible permutations, and you'll see that 1:37M could well be a FINAL probability.
Actual experience counts for something. (And as someone who still likes to consider himself a Young Turk, I hate myself for saying that!)
If you can go to bed, knowing you did a valuable thing today, you're very lucky. If you can't... it's not bedtime
1 in 37 million ?
I don't think so. Maybe onle one person in 37 million would match that DNA, but they were searching from 660,000 people. That makes the probability 660,000 : 37,000,000 or more plainly,
1:56.
I bet that figure never came up at trial. This is blatantly a case of a mis-understanding of probability, from what I have read about the case. They have to use DNA to narrow the search from a few suspects, instead of using it to pick out a person from 660,000 previous convicts.
This problem is similar to the so called birthday problem i.e. given a number of people n, what is the possibility of two of them sharing the same birthday? If I remember my stats correctly, the result is surprisingly large...
For n=2
364 ways second person could have birthday without matching first
For n=3
363 ways third person can have birthday not matching other two
p(match) = 1-365x364x363/(365^3)
....
when this gets to about 20, p(match) is about 40%!!
The chances of a DNA match amounts to a similar problem, so the stats rapidly build up to an high likelihood of a match after about 20-30 samples.
Donte Alistair Anderson Roberts - hi son!
Karma: Chameleon
The environmental factors are acting all through out the identical twins life time to make their DNA different. They're know as viruses. Other mutagens will also cause even more differences over time.
Then there is the testing method. The electrophersis gell tests used have rather poor repeatability. Sure some things can be done to help make them better. I wouldn't accept a match when the samples are done on two different machines in different labs. Having two different gel suppliers also makes a huge difference. The test is really only telling you the length of strands between markers where the chemicals split the strands into segments.
He's at it again, that damn trickster Loci. He makes trouble wherever he goes. And now the FBI has 13 of him... boy are they in trouble.
War is Peace. Freedom is Slavery. Ignorance is Strength. - George Orwell or George Bush?
IANAS (in fact, I hate that branch of mathematics with a passion), but I do know enough to be able to say that this is inevitable .
They say there was a one in 37 million chance of this false match occuring - so? There's a one in multi-millions chance of someone winning the lottery, and yet it generally happens (I realise they're not equivalent cases, but it does show my point) - whenever you talk about probabilities, you have to realise that they are only relevant over a statistically significant sample size. They say nothing about individual cases - anomalies happen, the one-in-a-million chance does happen, and almost certainly will happen if you take a large enough sample.
The most important thing to understand is that this anomalous case does not invalidate DNA evidence - all it does is highlight the statistical nature of such evidence. DNA evidence (assuming the methodology of the tests is good) is exactly as useful now as it was before - that is, very useful - as long as it isn't abused. And generally speaking, the various police forces that use it are honest enough that they don't abuse it (witness the fact that they got a second opinion in this case).
This is an interesting and eye-opening occurence, but it isn't the end of DNS evidence in forensics.
himi
--
My very own DeCSS mirror.
Illinois suspended executions after they realized that while they had executed 12 people since the reinstation in 1977, 13 people had been freed from Death Row. Those are not good odds.
This point is the only valid take-away from the whole article. The British database only captures a DNA fingerprint based on 6 loci and we've all seen the math on that. The vast majority of US states and all federal cases require DNA tests with more than 10 loci. The odds of this error cropping up in the states is significantly less.
p.s. This is an *old* story. It was reported at the beginning of the month in several British papers and ran on CNN on Tuesday. Granted, Saturday night is a slow time for Slashdot, but it'd be nice to hear stuff we didn't already know. :)
Shut up and eat your vegetables!!!
This brings up too issues. The first is the tendency to believe that technology can put complex techniques within the capabilities of people without training in the field. The second, closely related, is the belief that the reliability of the technology is not effected by the possibility of human error. On anything where the odds are stated as being that long, the two things I always ask are:
The net will not be what we demand, but what we make it. Build it well.
Well, if they were conjoined twins then the left one is always the evil one. ...according to the simpsons anyway
This really shouldn't come as that big a surprise to people - no more so than someone winning a lottery.
As the article mentions, there is a 1 in 37 million chance of this happening. Statistically this means that while it will not happen often, it will happen at some point.
I think the problem arises from the wide spread belief that DNA testing is infallible and provides concrete proof of a persons guilt/innocence - it does not.
DNA evidence is just that, evidence, and should be regarded as such in court. If DNA testing along with collaborating evidence indicates the person is guilty, then they probably are - or vice versa. If there is evidence that points against the DNA results, one should not automatically assume that the DNA results are correct.
"They do not preach that their god will rouse them, a little before the Nuts work loose." Kipling, 'The Sons of Martha'
As Terry Pratchett says:
"Million to one chances happen nine times out of ten."
"Would you like a cold drink with that Sir? Yes, yes, for the sake, of the future, of all mankind, I will have, a sm
Did you read the article - they re-tested with 10 points of reference, which supposedly has a 1 in 1,000,000,000 chance of a mismatch, so it was more a case of not using the most reliable test they could. Also, apparently in the US they use 13 points of reference, which presumably has a stupidly large number for it's mismatch chance. I guess it'll just change the procedure so they use the 1 in 37,000,000 and re-test with a higher level if it matches to confirm it.
Are there any figures for finger print testing? How truly unique is a single finger print, and whats the chance of mismatch with 2 finger prints? DNA testing is still pretty accurate!
Right, 37 million to one is not very big odds when you're doing 245 billion independent tests.
If the probability of a false positive in any individual test is p, then the probability of conducting n tests without getting any false positives is (1-p)^n. As pointed out, this means that if enough tests are done you'll almost certainly convict an innocent person. If you have two crimes with DNA evidence that is only this reliable, then more than likely some innocent person in the UK would test guilty.
Actually, it's worse than this because people don't have independent DNA - they're likely to be distantly related. This makes false positives even more likely.
If there are n people and you want the probabilility that any of them test positive to be less than x then you need
1 - (1-p)^n < x, which is nearly the same as 1 - p*n < x. So to be fairly sure that nobody in the world falsely tests positive you need p to be less than about 1 in 80 billion.
perl -e 'fork||print for split//,"hahahaha"'
The probability of a false positive match approaches 1 as the number of samples approaches oo.
P(false positive) = 1 - P(no false positives)
= 1 - (P(correct answer))^n
= 1 - (1-p)^n
-> 1 as n -> oo.
This is ignoring the probability of a false negative; this is very low since only one person can commit a crime!
perl -e 'fork||print for split//,"hahahaha"'
This is so basic, I can't even believe it! I can't believe peoples lives are decided on such a weak mathematical basis!
If the chance of a match between two random DNA samples is 1/37.10^6, and they have 660000 samples in their database, then the likelihood -- assuming their system does'nt give false positives, which I doubt -- of a database match is ... 1.78% !!! We don't know how much DNA tests they make each year, but it's porbably well over a thousand, wich leads to over 10 false positives a year!
Americans find that "mind blowing"? Minboggling stupidity, if you ask me