Scientists and Lawyers Argue For Open US DNA Database
chrb writes "New Scientist has an article questioning the uniqueness of DNA profiles. 41 scientists and lawyers recently published a high-profile Nature article (sub. required) arguing that the FBI should release its complete CODIS database. The request follows research on the already released Arizona state DNA database (a subset of CODIS) which showed a surprisingly large number of matches between the profiles of different individuals, including one between a white man and a black man. The group states that the assumption that a DNA profile represents a unique individual, with only a minuscule probability of a secondary match, has never been independently verified on a large sample of DNA profiles. The new requests follow the FBI's rejection of similar previous requests."
Before DNA tests are accepted as conclusive much better studies should be done, particularly for false positives.
I believe DNA tests should be used for finding someone innocent rather than guilty. Negatives aren't that big a problem. If there are discrepancies then obviously it's not the same DNA.
Positives are another issue, how many common features there must be to accept two DNA samples as coming from the same individual?
Just started watching Gattaca, was inspired to watch it from this CG short ( http://vimeo.com/7809605 ) as it used the sound track.
I have been concerned for years about this, because you often hear prosecutors and "expert" witness testimony to the effect that "the odds are billions to one against this being someone else".
Among other possible statistical mistakes, these unrealistically large numbers are based on the idea that each genetic location being compared is statistically independent. But in fact we know that to not be so. What we definitely do not know is how, or how often, many of these may actually depend on each other.
Let me give you a purely hypothetical example: what are the odds that a genetic profile from a random person contains a gene determining curly hair. What are the odds of finding this gene in a random sample?
You can answer this approximately by simply observing what percentage of the population has curly hair. Let's say 1/4 just for argument. So your odds are 1 in 4.
But here's the kicker question: what are the odds that a genetic profile includes a gene for curly hair, given that it also contains a gene for sicle cell anemia?
The odds are going to change drastically.
This is not a real example, of course, just illustrative. But one can easily see that the contents of genetic locations are NOT necessarily statistically independent, even if one of them does not directly cause the other.
We simply do not know enough to say that any two genetic locations are truly independent. Therefore these huge probabilities ("billions to one" for example) being spouted by prosecutors are completely specious.
Anyone have the full text from New Sensationalist or Nature? NS is harping on the fact I used up three of their articles already and now I have to pay/register.
Here's hoping they fade into obscurity like Salon.com.
By the way, I should point out that there are at least several public and private DNA databases being developed in the U.S. alone. However, some of them are for special purposes (genealogy for example), and will test different locations than those used by forensics labs.
Are we talking about here? If this is a catalog of DNA of convicted criminals then it might be ok. But if its also DNA samples from other people who gave a sample to clear their name, then I don'yt think it should be made public.
What about an release of the database minus any personally identifiable information? That should be sufficient to determine uniqueness shouldn't it?
Letter is at http://www.bioforensics.com/articles/Krane_Science_letter_2009.pdf/
Even more so than the issue of statistical independence or veracity of the DNA testing process itself (which SHOULD be investigated) is the simple possibility of corruption, incompetence, or simple mistake. If a DNA testing lab simply accepts a bribe to give their expert testimony, has a mistake and switches sample vials, etc, their expert court-testimonyer will still show up in court claiming "The chances are approximately eighty-three bazillion to one".
This giant number has the emotional effect of certainty, but that number is just the chances that the sample the DNA lab recieved corresponds to the DNA of the accused--IF NO MISTAKES WERE MADE and nobody is planting evidence or accepting bribes. It's not the chance that the accused is innocent. I'm sure this distinction is made in the verbal "fine print" but the jury will still be swayed. The giant odds numbers are nothing powerful emotional hooks. The real possibility that the DNA evidence does not finger the accused breaks down like this:
1:1billion the DNA matches someone else due to a flaw in the statistics of DNA testing
TIMES
1:$smallernumber the DNA lab has accepted a bribe, has a mole, made a mistake, etc
TIMES
1:$smallernumber the DNA lab has honestly received a sample from the accused but the sample was planted at the scene by police, the real criminal, or really bad luck.
The jury won't be considering these factors when they hear the "1:1billion" number. It's nothing but sensationalism.
Scientists already know that the human genome (DNA) is not the complete blueprint for an organism. The human epigenome, which is far more complex, and contains more of the details about how to put those building blocks together, is no less important...and seems likely that it contains more of what separates us as individuals.
Having the names of the people associated with each DNA analysis would be completely unnecessary. Just assign each person a unique, meaningless number in place of their name and the problem is solved. There's probably 6 other ways to solve the privacy problem and still make the data useful. If researchers find special cases where they need actual identities to better understand what's going on, make them sign NDAs and release the information to only them.
The FBI doesn't want to release this because they know there's a lot of partial or complete matches in the database. Suddenly having news stories about how there's 100 people in the FBI DNA database with the same 13 identifiers (flash to expert testimony claiming billions to one of such a match) would be a major disaster for the FBI. The FBI would then talk about how most of them are the same person using different names, and various other explanations, but the damage would be done (flash to news story about one side of a match being a 22 year old male from Alaska, and another a 76 year old female from Florida).
I understand why the FBI doesn't want to do this, but it's extremely important data about how valid this type of DNA testing is (especially within certain populations) (flash to news story about racism). Essentially the government holds evidence about the validity of DNA testing that's relevant to thousands of criminal cases that it refuses to release. That sounds like a strong constitutional issue to me.
AccountKiller
The FBI's database only uses 15 markers, checking 15 sites in DNA. That's not good enough, and there are false matches. The problem is that they're using DNA technology from about 1990.
23andme, the commercial DNA analysis service, checks 580,000 sites in DNA. 23andme probably has enough data to validate the quality of the FBI's marker selection. That's a good way to check. Identical twins do match, even at the 23andme level of analysis.
Its similar to the birthday problem. Given a class of 35 students the odds that one of them has the same birthday as yours are 35/365 = 9.5%. However, the probably that there are two students in the class who have the same birthday (not necc yours) is about 81% (check Birthday Problem on Wikipedia).
Its the same here. The probability of there being matches between different people in a large database of DNA is going to be a lot higher than the probability that there is a match to a given person or crime scene DNA.
And the privacy rights of convicted criminals are different from "normal" citizens -- why?
Look. There's a reason society puts some people in jail. That's considered necessary for the protection of the others. But curtailing their rights in whatever other arbitrary ways is not OK.
There's this misconception that people lose their civil rights by becoming criminals. They don't.
Forgive me that I'm a layperson who didn't RTFA, but this story makes me wonder how they actually arrive at these astronomically low probabilities that the DNA profiles of two different people are accidentally identical? They wouldn't just include some random base pairs in the profiles and then calculate the probability as p=(1/4)^(number of base pairs), which would not account for the fact that 99.xxx percent of all base pairs are identical in all humans... would they? I was always assuming that, given that scientists who know what they're doing should have invented this test, there was some sophisticated process that would ensure that they would somehow only choose base pairs from the subset that was actually different in different individuals (and, more specifically, where each of A,C,G and T would have a 0.25 probability of occuring). I'm still relatively confident that something like this takes place, but sometimes you can be just astonished at how stupid people can be...
Assume several thousand matches are found in the database. Defense lawyer will argue odds are in the thousands that the defendent was falsely matched. This is wrong. Much like the puzzle of how many people do you need to have at a party to have two with the same birthday (about 30, I believe). But the odds that two people have the same birthday are about 1 in 365 not 30/365 as would be falsely concluded using the same arguement as above.
Assume odds are 1 in 10,000,000 that two people have the same DNA profile. Then defense lawyers asks expert witness
"How many people would have to be in a stadium before the odds are greater than 50% that two have the same profile?
Witness "About 4400."
Of course the readers of slashdot would be excused from the jury by the defense as they would not fall for this.
I think a comprehensive review of methods is more than overdue, so yes, a review of how DNA is used and how reliable it is, anyway, plus how it deals with aforementioned people with multiple DNA signatures due to medical or other conditions is not merely a good idea, it is a necessity. Of course, there are important privacy and mission creep considerations, and even the "standard" anonymizing measures are likely to be insufficient. To that I can but say, alright, find a way, because we need to know how reliable that evidence is, and merely hiring expert witnesses is not enough. Solid scientific method and peer review are the way to go.
Which brings me to another scary thought: What large scale scientific studies have been done regarding fingerprints as legal evidence?
I think the parent poster takes issue with your near 100% reliance on subjective evidence, and treatment of objective evidence (as reported to you by imperfect humans) as no better.
You could at least see that you would never convict a mobster (who'd have heaps of false witnesses standing by). Or anyone from a closely-enough-knit group for that matter.
It'd also be relatively easy to get you to convict an innocent. If you based yourself on witnesses alone, obviously, you can see why you'd be liable to convict someone of witchcraft, for example. Even in this day and age. There is never any shortage of witnesses for idiotic accusations.
Objective evidence must take precedence over subjective evidence. Yes, DNA can be planted/faked, but if that's the case, more research will find inconsistencies. The solution is not less reliance on objective evidence, imho, but more, and better methods. Perhaps some of this "stimulus" money could be put into creating an agency to improve the collection of objective evidence (like DNA evidence). Say paying a few people to plant fake evidence using a variety of methods without telling the researches and work that way on improved detection methods.
Who reads New Scientist? After their ridiculous article advocating mutilating boy infants the other day!
excuse me for being a noob, but if two beings have the same DNA, how did one turn out black and the other white? unless, the samples were taken from m.j. at two different stages of his life...
99.9% of a human DNA is identical to all other humans no matter the race. In fact we are about 99% similar to a banana's genome. They aren't looking at particular genes - obviously we are all going to have the same genes that code for the basic body plan and proteins needed for basic life.
When they do these DNA tests they are looking at satellite repeats which are almost always outside of the coding regions of genes. Here is where the differences rack up. The repeats are highly variable across the species and there can be any number of repeats in each one of these "groups". Mathematically you would only need to look at 15 or so to have enough data to be completely unique among the 6.7billion people on the planet. Often in criminal cases they look at far more than this and so the probability that two people in the world have the exact same genome (excluding identical twins) is small enough to express certainty. The cases in which DNA has falsely accused someone is user error - the genes don't lie, some people, even though they may be scientists, are still idiots.
Good find. The correct link is: Science Letters - Time for DNA Disclosure
It was only when a retired FBI metallurgist did testing by himself that he proved that the technique was useless. Then the NSF did a study and found the same result, and the FBI stopped using this test. http://www.fbi.gov/pressrel/pressrel05/bullet_lead_analysis.htm [fbi.gov]
Now the FBI has a secret data base that they use to claim that people are guilty. They will not release the data for independent verification of their results. Do you really think that they can be trusted one more time?
Why is Snark Required?
The method of fingerprint identification was more of a learned-craft than based on rigorous scientific testing up to 30 years ago. What saved its butt was that identification was computerized in the 1970s. If the algorithms gave too many false matches, then the technique would collapsed like a house of cards. But the algorithms appear to work reliably. I recall some defense lawyers attacking the fingerprint method at that time, much like the early years of DNA matching.
The argument from an article in Wired years back suggested that government security camera feeds be made available for realtime public viewing. That could then check abusive uses of this system when you have "the watched watching the watchers".
Ditto for open source software, like for computer security or voting. More eyes can spot more flaws.
Dear Feds. If you have nothing to hide, you have nothing to fear. Right?