Slashdot Mirror


British DNA Database Mismatch

nahal writes "DNA evidence is extremely compelling to a jury at trial when trying to convict a suspect. In this article at USA Today, the world's largest DNA crime-solving machine, located in Great Britain, mistakenly matched a suspect to a crime in a 1-in-37 million chance. American experts have called it 'mind blowing'."

29 of 194 comments (clear)

  1. Re:The system is already flawed by sjames · · Score: 2

    This system can only be relied upon to "prove" guilt where every loci is tested.

    Even if all match, there is a very tiny but non-zero probability that the match is a false positive. The question then is how much doubt constitutes reasonable doubt? (Or the equivilant phrase in non U.S. courts).

  2. This is what happens... by sjames · · Score: 2

    ... When prosecutors abuse scientific evidence with pseudoscience. DNA evidence is exclusionary in nature, not inclusionary. In other words (assuming no procedural errors etc) no match = didn't do it, match = COULD have done it. Of course, prosecutors would have the jury believe the opposite. If science is to be used to convict, then scientific thinking MUST be involved if there is to be fairness. No proper scientist would consider a DNA match on 6, 10, or 16 loci as conclusive (but would consider it a VERY strong reason to investigate further).

    Consider the 1 in 37 million. If the database were complete for the world population (about 6 billion), that means that on average, any given DNA sample would appear to match 162 people. The 16 locus test that the FBI uses is better, but still is not damning in and of itself.

    Now, add in procedural error and other bad thinking and you have (to me) reasonable doubt unless there is some other evidence.

    I am certainly not against convicting criminals, but I AM against decieving juries into believing that a DNA match is damning evidence. Matching DNA evidence should be regarded as the beginning of an investigation, not the end.

  3. They reported problems with DNA testing long ago by Malc · · Score: 2

    I seem to remember watching a programme on the TV about this (can't remember which one, it was along the lines of an investigative news magazine programme, probably after 9.30, probably on the beeb, I think), and it was at least 4 - 5 years ago. I know it was in that time frame because emmigrated 3.5 years ago.

    Anyway, they made a claim that the current DNA testing at that time was flawed and often made matches that were incorrect, flying in the face of the astronimcal odds. I think that there were two stages to the problem, one was cross-contamination, and two, the cloning process that makes the sample big enough for testing cloned the contaminating DNA too.

    Perhaps the labs were using the same containers for both the evidence DNA and the sample DNA without proper cleaning between tests? It only takes one fragment of DNA to screw the whole thing up. I think that there was serious concern about the use of cost-cutting independent labs who were bidding to do this work for the police at the lowest possible rate.

  4. How is this mind-blowing? by Millennium · · Score: 2

    They used their tests improperly, and they call it mind-blowing?

    Look. It's a 1:37-million chance if you're comparing one person's DNA to one sample (probably found at the crime scene) That's why you only use DNA testing to weed people who couldn't possibly have been involved from a very narrow range of subjects. You can't pick out one suspect from a huge list.

    This is the problem with archiving everyone's DNA. You know it'll be used for stuff like this, because law enforcement will get lazy.

    DNA testing is a Good Thing. It's a very safe, reliable way to identify suspects. But only if you use it properly. This is hardly a "proper" use of the tests, and I'm not at all surprised that this happened. It's a case of lazy law enforcement more than faulty testing.

  5. Re:Statistics and probability by PG13 · · Score: 2

    This is a very important point and should be moderated up.

    It makes the utmost difference whether the police have a suspect and then use DNA matching to see if he did the crime or if they use DNA matching to find a suspect. As this poster mentions it is then a much lower probability that you did in fact commit the crime.

    It is exactly the same as disease testing. If you have a large population which is uninfected (not guilty) a positive match even from a very reliable test is highly likely to in fact be an error.

    Of course if you up the test to some obscene number of points you can probably make the probality of error very small again. Of course this leaves the scary possibility that people are falsely convicted because they left a hair lying around...but their are always false convictions.

    --
    Marriage is the "pseudo-ethics" that cloaks the messy truth of sexuality in the raiment of propriety -- it's "Don't Ask,
  6. Re:DON'T THEY KNOW ANYTHING ABOUT STATISTICS? by Nicolas+MONNET · · Score: 2

    Maybe I did'nt express myself properly ... I'm talking about a random DNA sample matching a sample in the database (assuming that those are unique). In that case the likeklihood of a false positive reaches 1 when the database has 37 million entries.

  7. More statistical problems by tilly · · Score: 2

    Everyone and their dog has shown that /.ers actually understand basic statistics. With a 1/37 million chance of a match between two people, and 660,000 people in the database, the odds of eventually coming up with a false positive eventually become quite high.

    What not so many have pointed out is that the true odds are probably lower than 1/37 million. That figure is based on the contents of each loci being independently distributed. (With about 1/18 of a match at each loci.) Well we know that is strictly not true - after all a sibling of yours will have 1/64 of getting the same loci from the same source that you did. But are there any larger effects?

    The answer is that there is. Suppose that some of the loci have a different distributions in frequency between anglo-saxons, Celts, and East Indians. Then the chance of finding a match between 2 East Indians could be far higher than they estimate. For instance if that 1/18 figure was changed to around 1/9, the chance of matching 2 East Indians now becomes about 1/530,000. Even if your database has only 50,000 East Indians in it, if an East Indian committed the crime, the chance of a false positive is around 10%. Much higher than you would expect. (I am using East Indians because I understand that they are a disliked racial minority in England. Substitute your favorite group if you wish.)

    So the moral of the story? Not only is the technique going to inevitably produce false positives, but it is likely to do so in a racially biased manner!

    Regards,
    Ben

    --
    My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht
  8. Re:Statistics and probability by i · · Score: 2

    I don't know where you get YOUR math from but it's not relevant.

    1: The chance of a DNA match (in this 6-loci case) is 1 to 37000000.

    2: That means that ONE DNA-sample compared to ONE other DNA-sample has the chance to in 1 of 37000000.

    3: If You have TWO other DNA-samples to match against you have a chance of match in 2 (TWO!) of 37000000 !

    4: Any other circumstances have no impact on this if THEY HAVE NOTHING TO DO WITH THE DNA-CODE !

    5: In this case we have 660000 OTHER DNA-samples to match against ! The rest is obvious ...

    Thomas Berg

    Mundus Vult Decipi

    --
    Mundus Vult Decipi
  9. Birthday Paradox? by Detritus · · Score: 2

    Is this sort of false match inevitable when you are comparing large numbers of DNA fingerprints from unsolved crimes to a large database of DNA samples?

    --
    Mea navis aericumbens anguillis abundat
  10. The question I ask... by FallLine · · Score: 2

    Your points are all valid. However, none of this means that DNA testing is not extremely valuable. In fact, no matter how you figure the odds (within reason). the odds of a person who is WRONGLY suspected being cleared quickly by DNA are likely higher than they are of a person being wrong convicted based on DNA. Knowing what we know through years of experience with DNA, we know that the odds of a false positive are still very slim, even if you factor in human error. If the odds of being wrongly conficted of a crime are a mere one in 36m (or whatever figure you might happen to quote), and DNA proves to be usefull in solving a great many crimes, the question you should also ask is can we afford not to use it? Think of how many people have been cleared by DNA. Think of how many murderers have been convincted and/or arrested before they could kill again. Do you honestly believe that the number of people wrongly convincted (based on DNA) exceeds (or even remotely approaches) the number of people who've been saved? How many people have been wrongly convicted based on DNA? The closest thing, to my knowledge, is this ONE (out of how many million?) guy in the article here, and he was not even convicted. I suspect any lawyer worth his weight could have refuted that, especially if the odds (based on the agreed premises) are as high as most slashdoter's have just purported (e.g., 1:56).

    Sure, all things being equal, I would prefer there be no chance of anyone being wrongly convicted; however, the fact of the matter is that we don't live in a perfect world. We were no better off before DNA testing. All we've ever been able to gaurantee in the courts is due process. There has always been (and likely always will be, to some degree) human error and prejudice involved in any trial. DNA, despite its flaws, brings us that much further away from those kinds of errors...

  11. Re:DON'T THEY KNOW ANYTHING ABOUT STATISTICS? by orpheus · · Score: 2

    I'm sorry you felt a need to take such a strong tone in your title

    The "1 in 37,000,000" figure is presented as a final probablility of a match. Where did you see *anything* about there only being 37,000,000 possible permutations?

    If there were only 37M permutations of 6 loci, that would imply roughly 20 discrete possible values at each loci. Is that how you envisioned the underlying data?

    I don't know what test they use in the UK, but I'm assuming that it's the RFLP -- basically they use a highly specific enzyme to chop up the DNA, and place it on a polyacrylamide gel under an electric current/field to measure the size of the fragments. (Actually, nowadays, they probably use pre-synthesized n-nucleoside primers and PCR [polymerase chain reaction] to chop and selectively amplify the fragments, but the principle is the same)

    A single gel can easily measure fragments ranging from a few hundred base pairs to 10-400+ kbp with good resolution. The exact range varies according to current/field, gel composition, and other factors, but the bottom line is: it's easy to see bands that are a millimeter apart, so if you use a foot long gel, the range of possible values is close to 300. that creates:

    300^6= 7.29 x 10^14 possible permutations

    Actually, 0.5mm is a more realistic resolution limit, so the actual number of resolvable values is at least 600.

    (600 values) ^ (6 loci) =4.6x10^16 permutations

    These are just crude estimates, for the benefit of those who've never read a electrophoresis gel. In actuality, the range of allowable values might be limited by other factors (values that are too extreme may be eliminated as artifacts) But it does give a sense of the TRUE numbers involved.

    (with modern gels and automated readers, the resolution may be even higher, but my experience was with UV lamps, eyeballs and Polaroid prints way back in the 1900's... 1991 or so)

    Please run your analysis again using this range of possible permutations, and you'll see that 1:37M could well be a FINAL probability.

    Actual experience counts for something. (And as someone who still likes to consider himself a Young Turk, I hate myself for saying that!)

    --

    If you can go to bed, knowing you did a valuable thing today, you're very lucky. If you can't... it's not bedtime

  12. Statistics and probability by AmirS · · Score: 2

    1 in 37 million ?

    I don't think so. Maybe onle one person in 37 million would match that DNA, but they were searching from 660,000 people. That makes the probability 660,000 : 37,000,000 or more plainly,
    1:56.

    I bet that figure never came up at trial. This is blatantly a case of a mis-understanding of probability, from what I have read about the case. They have to use DNA to narrow the search from a few suspects, instead of using it to pick out a person from 660,000 previous convicts.

    1. Re:Statistics and probability by divec · · Score: 2

      > if you are in the database you have previously committed a crime

      Not quite. If you are in the database you have been *convicted* of a crime. You may not have actually been guilty. For this reason, your previous track record cannot legally be taken into account when deciding whether you are guilty.

      --

      perl -e 'fork||print for split//,"hahahaha"'

    2. Re:Statistics and probability by divec · · Score: 3

      No, the original poster was right. The chance of a false match on the file is

      1 - (1 - 1/37million) ^ 660,000

      which is nearly the same as

      660,000 / 37million = 1/56.

      --

      perl -e 'fork||print for split//,"hahahaha"'

  13. Re:Simple Probability by maroberts · · Score: 2

    This problem is similar to the so called birthday problem i.e. given a number of people n, what is the possibility of two of them sharing the same birthday? If I remember my stats correctly, the result is surprisingly large...

    For n=2
    364 ways second person could have birthday without matching first
    For n=3
    363 ways third person can have birthday not matching other two
    p(match) = 1-365x364x363/(365^3)
    ....

    when this gets to about 20, p(match) is about 40%!!

    The chances of a DNA match amounts to a similar problem, so the stats rapidly build up to an high likelihood of a match after about 20-30 samples.

    --

    Donte Alistair Anderson Roberts - hi son!
    Karma: Chameleon

  14. Re:Twins? by Bryan+Andersen · · Score: 2

    The environmental factors are acting all through out the identical twins life time to make their DNA different. They're know as viruses. Other mutagens will also cause even more differences over time.

    Then there is the testing method. The electrophersis gell tests used have rather poor repeatability. Sure some things can be done to help make them better. I wouldn't accept a match when the samples are done on two different machines in different labs. Having two different gel suppliers also makes a huge difference. The test is really only telling you the length of strands between markers where the chemicals split the strands into segments.

  15. I knew DNA was all mythology by r2ravens · · Score: 2

    He's at it again, that damn trickster Loci. He makes trouble wherever he goes. And now the FBI has 13 of him... boy are they in trouble.

    --
    War is Peace. Freedom is Slavery. Ignorance is Strength. - George Orwell or George Bush?
  16. Welcome to the world of statistics . . . by himi · · Score: 2

    IANAS (in fact, I hate that branch of mathematics with a passion), but I do know enough to be able to say that this is inevitable .

    They say there was a one in 37 million chance of this false match occuring - so? There's a one in multi-millions chance of someone winning the lottery, and yet it generally happens (I realise they're not equivalent cases, but it does show my point) - whenever you talk about probabilities, you have to realise that they are only relevant over a statistically significant sample size. They say nothing about individual cases - anomalies happen, the one-in-a-million chance does happen, and almost certainly will happen if you take a large enough sample.

    The most important thing to understand is that this anomalous case does not invalidate DNA evidence - all it does is highlight the statistical nature of such evidence. DNA evidence (assuming the methodology of the tests is good) is exactly as useful now as it was before - that is, very useful - as long as it isn't abused. And generally speaking, the various police forces that use it are honest enough that they don't abuse it (witness the fact that they got a second opinion in this case).
    This is an interesting and eye-opening occurence, but it isn't the end of DNS evidence in forensics.

    himi

    --

    --

    My very own DeCSS mirror.
    1. Re:Welcome to the world of statistics . . . by jflynn · · Score: 2

      Truly.

      To be specific, if their database has 700000 entries in it, it has 700000*699999/2 pairs in it. That's 245 billion pairs. If the odds against any pair matching at random is 37 million to one, that means there are a *LOT* of matches in that database, probably about 7000 of them.

      This simply seems to be a case of scaling the database without scaling the identification key --with predictable results, non-unique keys.

      Anyone know how the probability of a bad match decreases with number of loci tested?

  17. Re:Death Penalty (a bit offtopic) by gorilla · · Score: 2

    Illinois suspended executions after they realized that while they had executed 12 people since the reinstation in 1977, 13 people had been freed from Death Row. Those are not good odds.

  18. Re:1:370000000 by cshotton · · Score: 2
    So why not just make all the tests 10 loci?

    This point is the only valid take-away from the whole article. The British database only captures a DNA fingerprint based on 6 loci and we've all seen the math on that. The vast majority of US states and all federal cases require DNA tests with more than 10 loci. The odds of this error cropping up in the states is significantly less.

    p.s. This is an *old* story. It was reported at the beginning of the month in several British papers and ran on CNN on Tuesday. Granted, Saturday night is a slow time for Slashdot, but it'd be nice to hear stuff we didn't already know. :)

    --

    Shut up and eat your vegetables!!!
  19. Re:DON'T THEY KNOW ANYTHING ABOUT STATISTICS? by dsplat · · Score: 2
    This issue was mentioned in some circles (not the mainstream press, unfortunately) in the wake of the OJ Simpson trials. The problem is that the public, and most of the pundits, it appears, were not educated about the fact that the odds of a false negative and the odds of a false positive are related but not the same.

    This brings up too issues. The first is the tendency to believe that technology can put complex techniques within the capabilities of people without training in the field. The second, closely related, is the belief that the reliability of the technology is not effected by the possibility of human error. On anything where the odds are stated as being that long, the two things I always ask are:

    1. Do we understand the odds? Are we aware of all of the factors that might be significant when we are trying to get results to that many decimal places? There are any number of factors that can be ignored when you are looking for imprecise answers.
    2. What were the opportunities for human error or corruption? I would expect them to be fairly high relative to the long odds stated here.

    --
    The net will not be what we demand, but what we make it. Build it well.
  20. Re:Twins? by TummyX · · Score: 2

    Well, if they were conjoined twins then the left one is always the evil one. ...according to the simpsons anyway

  21. DNA is evidence, not proof by Myriad · · Score: 2

    This really shouldn't come as that big a surprise to people - no more so than someone winning a lottery.
    As the article mentions, there is a 1 in 37 million chance of this happening. Statistically this means that while it will not happen often, it will happen at some point.
    I think the problem arises from the wide spread belief that DNA testing is infallible and provides concrete proof of a persons guilt/innocence - it does not.
    DNA evidence is just that, evidence, and should be regarded as such in court. If DNA testing along with collaborating evidence indicates the person is guilty, then they probably are - or vice versa. If there is evidence that points against the DNA results, one should not automatically assume that the DNA results are correct.

    --
    "They do not preach that their god will rouse them, a little before the Nuts work loose." Kipling, 'The Sons of Martha'
  22. Not as unlikely as you might think... by Sfuerst · · Score: 2

    As Terry Pratchett says:
    "Million to one chances happen nine times out of ten."

    --
    "Would you like a cold drink with that Sir? Yes, yes, for the sake, of the future, of all mankind, I will have, a sm
  23. Re:juries by blane.bramble · · Score: 2

    Did you read the article - they re-tested with 10 points of reference, which supposedly has a 1 in 1,000,000,000 chance of a mismatch, so it was more a case of not using the most reliable test they could. Also, apparently in the US they use 13 points of reference, which presumably has a stupidly large number for it's mismatch chance. I guess it'll just change the procedure so they use the 1 in 37,000,000 and re-test with a higher level if it matches to confirm it.

    Are there any figures for finger print testing? How truly unique is a single finger print, and whats the chance of mismatch with 2 finger prints? DNA testing is still pretty accurate!

  24. Statistical problem can be overcome by divec · · Score: 3

    Right, 37 million to one is not very big odds when you're doing 245 billion independent tests.
    If the probability of a false positive in any individual test is p, then the probability of conducting n tests without getting any false positives is (1-p)^n. As pointed out, this means that if enough tests are done you'll almost certainly convict an innocent person. If you have two crimes with DNA evidence that is only this reliable, then more than likely some innocent person in the UK would test guilty.
    Actually, it's worse than this because people don't have independent DNA - they're likely to be distantly related. This makes false positives even more likely.
    If there are n people and you want the probabilility that any of them test positive to be less than x then you need
    1 - (1-p)^n < x, which is nearly the same as 1 - p*n < x. So to be fairly sure that nobody in the world falsely tests positive you need p to be less than about 1 in 80 billion.

    --

    perl -e 'fork||print for split//,"hahahaha"'

  25. P(false positive) -> 1 as n -> oo by divec · · Score: 3

    The probability of a false positive match approaches 1 as the number of samples approaches oo.

    P(false positive) = 1 - P(no false positives)
    = 1 - (P(correct answer))^n
    = 1 - (1-p)^n
    -> 1 as n -> oo.

    This is ignoring the probability of a false negative; this is very low since only one person can commit a crime!

    --

    perl -e 'fork||print for split//,"hahahaha"'

  26. DON'T THEY KNOW ANYTHING ABOUT STATISTICS? by Nicolas+MONNET · · Score: 4

    This is so basic, I can't even believe it! I can't believe peoples lives are decided on such a weak mathematical basis!

    If the chance of a match between two random DNA samples is 1/37.10^6, and they have 660000 samples in their database, then the likelihood -- assuming their system does'nt give false positives, which I doubt -- of a database match is ... 1.78% !!! We don't know how much DNA tests they make each year, but it's porbably well over a thousand, wich leads to over 10 false positives a year!

    Americans find that "mind blowing"? Minboggling stupidity, if you ask me