This Impenetrable Program Is Transforming How Courts Treat DNA Evidence (wired.com)
mirandakatz writes: Probabilistic genotyping is a type of DNA testing that's becoming increasingly popular in courtrooms: It uses complex mathematical formulas to examine the statistical likelihood that a certain genotype comes from one individual over another, and it can work with the subtlest traces of DNA. At Backchannel, Jessica Pishko looks at one company that's caught criminal justice advocates' attention: Cybergenetics, which sells a probabilistic genotyping program called TrueAllele -- and that refuses to reveal its source code. As Pishko notes, some legal experts are arguing that Trueallele revealing its source code 'is necessary in order to properly evaluate the technology. In fact, they say, justice from an unknown algorithm is no justice at all.'
about the code!
Just look at all those accurate global warming models that correctly predicted I would be 10 feet under ocean water by now.
Modern encryption is based on "zero-information proof".
Math says truth can be determined from the machine without knowing the source code. These morons just don't know math.
They are social engineers that discovered they can work for criminals for huge sums in order to keep them out of prison. The vast majority of "defense" is really about "evil". They are going to try and attack anything the average 5 year old can't instantly grasp, because that is the level they target. It makes their jobs easiest.
Bah humbug. Go claim there is no such thing as radar when your ballistic missile defense from North Korea depends on it, and try not to burn when the nukes fly.
I think it is very reasonable to ask access, covered by NDA, to a source code when such code is used to produce results for criminal prosecution. Unless they can show independent third-party validation of their tool.
We have seen issues with red light cameras, we have seen issues with labs doing drug testing on hair, we have seen child abuse panics from psychology "experts". Both methods and experts have to be open for independent, impartial validation. Otherwise they are no better than a duck test.
As Terry Pratchett wrote somewhere: "Evidence means 'that what is seen'". Nuff said.
Paaia
A lot of expert witness testimony comes down to a judgement call -- "In your opinion, as someone who has been working in this field for 20 years, how confident are you that these signatures / bullet marks / fingerprints / DNA match?" That's the result of an algorithm that you can't examine either, and has at least as much opportunity for being corrupted by unconscious prejudice or outright bribery as a piece of software.
TCP: Why the Internet is full of SYN.
"justice from an unknown algorithm is no justice at all"
...
A successful conviction may be legitimately tipped by accurate checked evidence, in this case DNA
But justice is not a matter of technical facticity. It is withholding something from a party that they deserve.
The evidence may help identify discrepancies between the two, but it is a major conflation to substitute that with justice.
Jurors and judges need to know what the probabilities are. Remember, in a criminal trial, the standard for evidence is "beyond a reasonable doubt." Sending people to prison for life or even to death row based on flimsy evidence is unacceptable.
This isn't to say that it hasn't happened before -- Cameron Todd Willingham was executed in Texas on the testimony of an "arson expert" with no formal training in the field.
The code should be evaluated or the tool should be banned from court. The company doesn't like it? Too bad. They don't have to sell to the forensic lab/law enforcement market.
You have the right to face your accuser, which includes examining the evidence against you. This is secret evidence. It amounts to "because we say so", and should not be tolerated.
A software bug you're not permitted to look for could send you to jail. At least with a human expert witness you can cross-examine them.
One thing is having access to the source code and a completely different story is properly analysing it. When dealing with something as complex as (probabilistic!) DNA sequencing, it seems quite clear that the most sensible way to validate the program is actually using it. Set up a proper benchmark with a relevant number of samples and confirm whether this (+ any other) program works exactly as expected. This would also be an excellent way to objectively assess its accuracy.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Expert judgement can be countered by other experts. Here we are being presented with something as a "Fact". There is no way to dispute it and there is no way to verify it which is what people are having a problem with.
One thing is having access to the source code and a completely different story is properly analysing it. When dealing with something as complex as (probabilistic!) DNA sequencing, it seems quite clear that the most sensible way to validate the program is actually using it. Set up a proper benchmark with a relevant number of samples and confirm whether this (+ any other) program works exactly as expected. This would also be an excellent way to objectively assess its accuracy.
Exactly this. My kingdom for modpoints.
You don't test software by looking at the code. You test the software by testing it. If it ain't broken, you're not testing hard enough.
While I'm very pro-OSS, I'm anti forcing private companies to disclose their source code. It is their work, their intellectual property. It's up to the judge to admit the closed-source evidence and up to the jury to weigh it.
I'm not a complete idiot... Some parts are missing.
Expert judgement can be countered by other experts. Here we are being presented with something as a "Fact". There is no way to dispute it and there is no way to verify it which is what people are having a problem with.
Questioning expert's qualifications is fair game in trials. If you can demonstrate that expert is not impartial, you can largely mitigate their testimony.
How do you question algorithm like if (1) = Guilty; other than code review?
Like our current justice system it probably just looks for black genes.
Well it shouldn't be accepted as fact. Ideally the courts would instruct the jury to treat the software's output as similar to a human being saying, "This is my expert opinion." You can submit your own software's "opinion" as evidence as much as you can get your own expert human to testify on your behalf.
It is true that you can't cross-examine it; but ideally, that should make the software less reliable. If you had an expert who, upon cross-examination, always responded, "I don't know, it just seems that way", then he wouldn't have much credibility. Ideally, software that can't justify its "opinion" should be treated the same way.
I have said "ideally" here several times, recognizing that it may well be the case that this isn't how people actually think. But I think a more constructive response to this misplaced trust is to help inform courts and defense lawyers more clearly (who should in turn inform the juries).
TCP: Why the Internet is full of SYN.
You get to ask an expert witness why their opinion is what it is, and if they answer "I'm not telling," their credibility is shot and there's a good chance their testimony will be thrown out. This software is an expert witness that nobody has any reason to believe giving testimony damning a person and then refusing to explain why but maintaining credibility. Analyzing whatever algorithm the software uses would be like questioning the witness, which is your right as a defendant in the USA, and keeping it hidden is literally denying you that right.
I've done some genomic work, during the Human Genome Project. I had to step away from the work due to my concerns about the lack of quality. The analysis software of the data, to assemble longer genesic fragements for testing and verification, was so very very poor that all the scientists learned to ignore the analysis and order longer sequence manually, by eyeballing it with their personal experience. It was hideously expensive to do this constantly, especially with the amount of sequences to sample and test and which came back "does not work". Part of the result was that, because they were probing in the dark, they got far more false positives that had to be tested later, as part of an even longer or overlapping sequence, that even *that* data was unreliable.
We have *had* crime labs falsify evidence, with cases like https://www.cbsnews.com/news/a... . Without the ability to verify the provenance of the data, of the results, and of the analysis tools, the DNA analysis can be far too easy to falsify. It should be as verifiable as the scales used to measure the weight of drugs, or the spectrographic analyzer and its software.
I'm anti forcing private companies to disclose their source code
In some cases, seeing the source code might be required, but under the most likely conditions this is a pretty useless formality. Very tough work which is very unlikely to output worthier conclusions than testing.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
So how would you determine the code only works as it does when being tested? You could easily have a trigger in the code to give a guilty verdict when desired, and never use it during testing.
"Well of course they're guilty, our code has been tested!"
Only an expert in the field is going to have a chance of understanding the algorithm.
That expert is either going to work for this company or their competitors.
They should only need to publish their valuation tests and the results.
area. IT isk the
What you're describing is exactly why experts frequently have their credibility challenged and why they need to provide the means by which to verify their credentials. The problem here is that they're providing no means by which to establish or confirm the credibility of the algorithm, and they know that doing so doesn't harm them as it would with an expert witness.
Imagine if the prosecution put an "expert" on the stand who testified how the prosecution wanted, but when the defense attorney asked where the "expert" went to school, where they worked, or how long they had been practicing, the "expert" refused to answer those questions and instead asked the jury to simply trust their "expert" opinion. They'd be laughed out of the court room, since the jury wouldn't know whether the "expert" was actually an expert or just a guy off the street. And that's how it should be.
Unfortunately, refusing to provide a means by which the credibility of this algorithm can be ascertained doesn't elicit the same response. Machines are commonly viewed as unbiased, logical, and factual, so while a human's refusal to allow their credentials to be verifiedwould be an immediate red flag, with a machine it doesn't mean much to most people. People are accustomed to thinking that algorithms produce factual results that can be taken at face value.
That's a problem when it comes to things that need to be verifiable, whether it's evidence in court or votes in an election.
You could easily have a trigger in the code to give a guilty verdict when desired
The problem is that, in a complex enough code, you might not be able to tell even by looking at the source code. Theoretically, you certainly could, but practically nobody would spend all the required effort to gain a perfect understanding. Here, for example, a probabilistic-based DNA sequencing approach! I am tired just with thinking about how intrincated and obscure that code might be! The calculation engine might be formed by walls of constants and complex formulae, which are extremely difficult to be analysed and which might carry any faulty bit. To not mention the alternative of "what if the code analyst also wants to trick you"? The most practical (and certainly used everywhere) approach is to properly test the corresponding piece of software and, eventually, take additional measurements like having a proper knowledge about the developing company.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Code for "facts" used in the courtroom hidden? Oh, you mean like how voting machine software and hardware design is often not available to the public for examination. All of it, anything on which democracy is contingent, needs to be published. No ifs, ands, or buts. Probably also applies to the code used in killer bots. The populace will need to know how a kill decision is made.
if ( isDarkie() || ( isPoor() && Math.random() > .5 ))
return "GUILTY";
else
return "NOT GUILTY";
It is different in that you can challenge an expert witness with your own witness. How can you challenge an algorithm that no one really knows? Considering that the FBI has used flawed statistics in DNA matching for a decade, this is not the first time that there are issues with how forensic science is done.
Well, there's spam egg sausage and spam, that's not got much spam in it.
would you like to whether to repeat if I remain nned your help!
Breathalyzers are effectively closed source under trade secret protections and we've convicted lots of people with those.
As a programmer I can assure you that I am infallible and perfect. My superiority is the reason I am a programmer and most people are not.
“Common sense is not so common.” — Voltaire
DNA probabilistic methods like this can do 3 things but can only be use to do one of them at a time. They can eliminate an accused, they can can eliminate all but one person from a predetermined sample of people to find the guilty person, or they can give the police a potential list of suspects. They CANNOT be used to do both of the last two. If I have a small partial DNA sample there will be multiple people in the world that it will match. If the police then just round up the first person that they find who matches and say oh the probability of a match this close is one in 300 million. Well no, if there were 300 million permutations and you looked in a population of 300 million people I would expect you to find a match (well at least 1 -1/e times) .
The problem is that, in a complex enough code, you might not be able to tell even by looking at the source code.
Even simple programs can be unreadable.
And malicious intent isn't annouced with a comment of 'backdoor access here'.
Arguably, the program can be evaluated without the source code.
At considerable expense. But even then it still is a problem because you would have to do it for every single case. Otherwise you have no way to know if something is different or wrong with the analysis in a case where no verification was conducted.
Simply use known samples and examine the output. Do the results of the analysis match what was known about the samples?
You're talking about using controls and/or independent testing methods. Not really good enough because if there is a discrepancy you run into Segal's Law (a man with a watch knows the time and a man with two is never sure). You have no way to know which test (if either) is the correct one. You would need to do those sorts of independent verification but you still cannot really accept any analysis in a court of law where the defendant cannot evaluate the methodology used to accuse them.
You're asking how to question an algorithm that assigns the value held in 'Guilty' to the first (second?) element of the array 'if' (I assume the word 'if' is not a reserved word.)
You don't test software by looking at the code. You test the software by testing it. If it ain't broken, you're not testing hard enough.
Doing a black box analysis of software when the code should be available for review by a defendant is so wrong headed I barely know where to start There is NO place for secret code when it comes to convicting people of crimes. The defendant should be able to question any and all methods being used to accuse them of a crime.
While I'm very pro-OSS, I'm anti forcing private companies to disclose their source code.
Tell me that when you are facing a life sentence and you aren't allowed to examine the code being used to send you to jail. If we're talking about a word processor, who cares but when we're talking about felony convictions for crimes I see no value to society in companies being allowed to keep such code private.
Actually, what spews out of these programs, and is presented in court as incontrovertible "mathemagical" evidence is a statement like "The likelihood that this degree of match could accord by chance is 1 in 14 trillion."
Meanwhile, the reality is that there is not enough data to support any such claim, because the actual statistical distribution is unknown, and the claim is based on flimsy assumptions, assumptions made in the "theory" behind the possibly buggy code, the code that you can't inspect.
Gotta be able to face your accuser...
Well it shouldn't be accepted as fact. Ideally the courts would instruct the jury to treat the software's output as similar to a human being saying, "This is my expert opinion." You can submit your own software's "opinion" as evidence as much as you can get your own expert human to testify on your behalf.
One of the requirements for presenting expert testimony is that you have to provide all of the materials that the expert used in forming their opinion. If the results of some software were treated as an expert opinion, the "materials relied upon" would almost certainly include the source code. It may even make the programmers, as the source of those materials, subject to being deposed about how they developed the software.
I'm anti forcing private companies to disclose their source code.
They don't have to disclose their source code. They can choose instead to have it not be usable in court.
Freedom of choice does not mean freedom from consequences.
They are asking not only for the source code but also for the algorithms behind it. It is much easier to evaluate the code once you know how it is supposed to work. In fact the algorithm and the math behind the code is what should be examined. The problem of untangling a mixture of DNA samples with levels close to the detection limit, as is in the case they discuss, is exceedingly complex. At that level the tests are highly prone to amplify a contamination (there was a case when somebody contaminated the samples with their own DNA because the tubes were opened while they talked). At the detection limit of the assay you also have stochastic effects where random alleles that are present in the sample are not detected. This is just talking about "single source" samples. Now add unknown number of sources of DNA to the mixture, every one of them with different amount and state of decay. I can easily imagine mixtures that cannot be unambiguously solved under ideal conditions. The claim that they can do that on degraded samples from multiple contributors, some of which may be relatives, at the limit of detection is one that requires extraordinary proof. I also buy the argument that revealing the code will infringe on his right to protect his IP. He can easily use patent protection instead of trade secret, which would allow examination of the science while protecting his IP.
This doesn't account for edge cases or deliberate tampering.
What if one of the programmers was of a malicious type who hated his ex-wife to the point where he would code a special routine in if her DNA were found (or if his own were found)?
Justice isn't about impartiality or facts. That's not its job; that's science's job. Justice's job is to regulate society so we can all get along. If some of us have to be sacrificed, so be it. Society needs a degree of certainty in order to function.
If we all agree on something--and if they courts say we agree, then we agree--then we have certainty. The case can be resolved, the guilty punished, and society can move on.
The courts do NOT, and have not for a very long time if ever, had the patience or resources to give everyone accused of a crime "a speedy and public trial, by an impartial jury of the State and district wherein the crime shall have been committed, which district shall have been previously ascertained by law, and to be informed of the nature and cause of the accusation; to be confronted with the witnesses against him; to have compulsory process for obtaining witnesses in his favor, and to have the Assistance of Counsel for his defence."
Do you seriously believe every dope dealer, thief, rapist, etc. is entitled to that? Yes, it was promised to you a long time ago, but you've been living in a cave if you really expect it when you show up.
Take the plea, do the time, pay the fine, and move on. You are guilty. If you drag it to a trial you've already pissed off all the other people in the room with the possible exception of your own lawyer. Don't look to them for help.
There are people who care about the quality of the facts that appear in court, but only in the abstract. Google "forensics on trial" and follow your nose. These people have about the same appeal to the process as any other scientist: Lawyers and the law are only interested in "facts" when they agree with theory; this is not a character defect; it's the nature of an adversarial legal system. It's supposed to be that way.
The concept that science should be valid in court is not important--only that it is *accepted* (by the court) and that it proves *my* point. Or at least gives a quick answer so we can all get outta here. (well, except for the guilty).
The "innocence" project is not called the Justice project. They're just as adversarial as any other legal organization.
Face it, folks--it's like a no-longer-mentionable comedian said:
"Mama doesn't want justice. Mama wants quiet!"
"Reality is that which, when you stop believing in it, doesn't go away." - Philip K. Dick
You're applying syntax rules to pseudocode.
"As far as we know, our computer has never had an undetected error."
What if one of the programmers was of a malicious type who hated his ex-wife to the point where he would code a special routine in if her DNA were found (or if his own were found)?
Your chances of finding out about that via testing are way higher than via code analysis. Even in case of having the source code, it is very unlikely that a so complex piece of software is properly analysed.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
I'm anti forcing private companies to disclose their source code.
How do you feel about voting machines? ;)
Years ago I used to work for an independent testing lab -- we got to see (under NDA) the source code for (some) voting machines. Also the hardware. We went through that software line-by-line looking for things that were contrary to FEC standards. (The hardware was similarly evaluated.) We recompiled the source and validated it against the distributed binaries. We found a lot of questionable lines of code, although mostly trivial stuff about insufficient documentation of inputs and outputs, but occasionally stuff like use of uninitialized variables or questionable coding practices. (In languages including C, C++, PL/I, even COBOL. gagh!) Don't remember all the details, it's was six or seven years ago.
That being said, we were looking for a specific list of issues. There could well have been crap in there that a sneaky-enough programmer could have made unobvious.
But the source code of something being used in law enforcement should undergo at least such inspection/testing. It's not disclosing source code (except under NDA), it's submitting it to independent verification.
You brought up a good point.
So the counter would be to write a program that accepted the same physical evidence data and simply returned whatever answer the defense wants.
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Freedom of choice does not mean freedom from consequences.
That is what it means actually. Well, more specifically it's about being free from consequences that you don't wish to be subjected to. If you don't have freedom from consequences that you don't want then it's meaningless because you're really talking about free will, not societal freedom. "If you do drugs, we'll throw you in jail as a consequence! We support your freedom of choice in doing drugs!" That isn't useful.
but also for the algorithms behind it.
With proper help, analysing the code is certainly easier but, if the original developers seriously wanted to hide something in a so complex piece of software, your chances of finding it via code analysis would be extremely low.
In fact the algorithm and the math behind the code is what should be examined
The underlying theory and the provided documentation are the worst parts to start looking for fishy bits. If they want to do something not too correct, they would hide it pretty well and, logically, don't tell you about it.
The problem of untangling a mixture of DNA samples with levels close to the detection limit, as is in the case they discuss, is exceedingly complex. At that level the tests are highly prone to amplify a contamination (there was a case when somebody contaminated the samples with their own DNA because the tubes were opened while they talked). At the detection limit of the assay you also have stochastic effects where random alleles that are present in the sample are not detected.
Are you saying that you cannot validate a DNA-analysing piece of software? How could that be true? Any piece of software can be validated. You have X input samples and Y expected results, if the program outputs the right result with a Z level of error is fine, otherwise is not. Whatever contamination or additional aspect should be possible to be removed, otherwise how are you expecting to use a so unreliable software/proceeding in court?
Now add unknown number of sources of DNA to the mixture, every one of them with different amount and state of decay. I can easily imagine mixtures that cannot be unambiguously solved under ideal conditions.
You have two options: either remove those cases from the tests or carefully analyse whatever output properly and determine whether it might be assumed correct. You have to be able to know what answer you expect either manually or by using an already-validated piece of software and, within certain confidence, determine whether the tested piece of software passes the test or not. If you do a proper test, with a big enough of proper samples and a proper assessing methodology, the pieces of software working fine should pass that test. Additionally, if the test results are so extremely difficult to be validated, how are you expecting to deal with the order of magnitude more difficult to analyse source code? At least, by assuming that plan to do it properly; just reading some basic ideas about its underlying algorithm would certainly be much easier.
I also buy the argument that revealing the code will infringe on his right to protect his IP
Note that my point is based on pure pragmatism, rather than on privacy/IP aspects.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
One of first cases covered by Groklaw, and by Slashdot too at the time. The code finally got subpoenaed, and it was SO bad that IIRC the manufacturer went out of business as the result.
If some incompetent defence lawyers let it slide unchallenged, it absolutely does not let manufacturers of such crap off the hook. The precedent is there.
You don't test software by looking at the code. You test the software by testing it. If it ain't broken, you're not testing hard enough.
But you use the code to find interesting boundary cases that need additional scrutiny in testing!
To properly test software *requires* access to source. Otherwise all you're doing is poking it with a stick to find vulnerabilities.
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
We've had the "source code" (protocols) for a lot of forensic techniques that later turned out to be crap. We don't need the source code; nobody can tell from source code whether a forensic procedure actually works reliably. To determine that, you simply need to perform double blind testing and run a large number of control experiments with every forensic test. That's true for all forensic tests, not just TrueAllele.
Enough.
Hell, PBS Frontline did a special about the horrors of modern "forensics", titled 'The Real CSI'
It's an eye-opener.
An enigma, wrapped in a riddle, shrouded in bacon and cheese
Not even close. Its called idiom; its allowed to be imprecise and not intended to be evaluated. The same point remains without any regard to the content of the code.
DNA is just a social construct It has no basis in reality
>It uses complex mathematical formulas to examine the statistical likelihood that a certain genotype comes from one individual over another,
Complex? Nope. As a bioinformatician, I can say for sure this is a very simple thing done with very basic statistics.
So the article claims a false positive rate of 1 in 211 quintillion for a particular trial. To test that with a 95% confidence interval we would need at least 600 quintillion samples. Now we're a bit short on people on this planet. I don't think Earth could support this many people so we need to colonize other planets. To make things simple, lets assume the average planet can support 10 billion people. Therefore, we need to colonize roughly 60 billion planets and test everyone on those planets. I think we can do that without leaving the Milky Way galaxy, so we should be OK.
Chris Mesterharm
As stated, seems useful for investigation/obtaining warrant, and accuracy can be confirmed with blackbox quality assurance. On the other hand, I would refrain from using as "star evidence." That said, if I were in there shoes, I would get a my lawyers to draft up a nice pair of NDAs, get a respected university to verify the science, and get a security company to review the coding to get a pair of gold stars. It might make DA's and investigators a bit more likely to take a look.
"If you do drugs, we'll throw you in jail as a consequence! We support your freedom of choice in doing drugs!" That isn't useful.
Tell that to Hawaii where they just announced that people in Hawaii that use legalized (within Hawaii) medical marijuana, although Hawaii will not enforce Federal Law regarding marijuana and immigration (sanctuary State), they will enforce Federal Law that prohibits people from legally possessing a firearm who use and addictive and/or Federally-illegal drug, and announced a 'grace period' for such people to turn in their guns to the government without prosecution. "Shall not be infringed" has been ruled to mean no decorative edging treatments are to be applied.
It's also an example of what can happen with firearm registries, and while I'll never register any of my large collection of military firearms.
Smart systems should be able to print a trace of their decision-making. If the code is not accesible, the particular instance of reasoning relevant to your case should at least be scrutinizable this way.
Ezekiel 23:20
If I have a small partial DNA sample there will be multiple people in the world that it will match.
No way. Does that mean there are multiple evil twins in the world I've never met?
So the article claims a false positive rate of 1 in 211 quintillion for a particular trial.
I didn't read the article, but that or any other issue doesn't change anything. If you aren't able to define accurate enough conditions to validate the corresponding piece of software, you would fail to do so anyway. Testing is much more likely to be quicker and more efficient than the alternative approach of analysing the code. Or do you think that by having access to the code you can guess what might be the output under so extreme conditions? If this was so, what would have been the point of having a piece of software in the first place if just by looking at the algorithm you can intuitively get the result?!
To test that with a 95% confidence interval we would need at least 600 quintillion samples.
No. This is not what the first statement means. And again, if you preferred to interpret it in that way and to analyse a so ridiculously big and completely unnecessary number of samples, I would recommend you to do it by running that software rather than by manually analysing the algorithm.
I don't think Earth could support this many people so we need to colonize other planets. To make things simple, lets assume the average planet can support 10 billion people. Therefore, we need to colonize roughly 60 billion planets and test everyone on those planets. I think we can do that without leaving the Milky Way galaxy, so we should be OK.
Out of all your ridiculous statements so far, this is my favourite one. Are you saying me that if you had to (manually) test 600 quintillion DNA samples, you would get them from 600 quintillion different people?! LOOOOOOOOL. You, this-can-be-solved-by-scaling-it-up guys, are too much for me! So, let's sum up your masterpiece so far:
1. You have a situation about which you clearly don't have even a slight understanding (or perhaps you are being consciously dishonest/partial for whatever reason; because properly understanding all this doesn't seem that difficult for virtually anyone with any kind background).
2. You take a random statement which seems appealing to you/to what you know ("quintillion" sounded nice to you, right?) from that description and interpret it in the most ridiculously wrong way possible.
3. You use that first stupid conclusion as an initial step to continue guessing increasingly stupid problems/solutions: if we need to do X DNA sample tests, we would take it from X different people; if we get out of people, we make more people; if get out of space for those people, we go to other planets, etc. Everything is so easy for you, isn't it? You are a solver! LOL.
4. You aren't able even to finish all that nonsense properly, because I would have been able to come up with a much funnier ending part myself.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Shouldn't actual extensive testing provide enough data to determine the efficacy of such a platform?!
Regardless of what's "under the hood"?
If it absolutely works at least as good as any alternative, then it seems a good tool, no matter how the process is done.
Self-importance and self-indulgence is the root of ALL evil.
US Constitution, sixth amendment: "In all criminal prosecutions, the accused shall enjoy the right...to be confronted with the witnesses against him; to have compulsory process for obtaining witnesses in his favor....". It seems to me that a device that announces something should have some humans, i.e. witnesses, testifying in its favor, but the courts may not agree.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
If my choices have no consequences, why bother? If my choices can have consequences I like, then they can have consequences I don't like, if only by comparison. This applies when discussing free will or societal freedom. Freedom from consequences I don't want is perforce ineffectuality.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
That depends heavily on how the software is written. The software can be written to match the algorithm so it's verifiable. It usually isn't, of course, but it would be nice if that were required for forensic software. After all, if we're using this in a court of law, we should be sure past a reasonable doubt that it's valid. I'm a software developer, and I'm frequently not sure beyond a reasonable doubt about software I personally have written, let alone other people's software.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Actually, GP is correct if we're resorting to empirical testing. We would want about six hundred quintillion samples to test against to verify that. To say that the chance is one in 211 quintillion rather than one in 211 quadrillion, which is three orders of magnitude difference, we'd have to have enough testing to show that the error rate was less than one in 211 quadrillion, which means that we'd have to have enough samples so that the failures were significantly less than one in 211 quadrillion. That one we might manage to verify by testing samples from a mere half billion people against each of the other half billion. We leave the problem of getting that much blood out of each test subject as an exercise for the reader.
If the company wants to claim one in 211 quintillion, they need to provide a basis for that belief. To apply a mathematical model to get that number, we'd have to be able to verify the model to that accuracy, and we'd have to make sure all real-world possibilities are accounted for. If there's a one in a trillion chance that accidental contamination of a sample would make it return a false positive, the probability estimate is off by at least eight orders of magnitude.
tl'dr: That probability estimate is completely unfounded, and shows that the company doesn't care about science when it would stop them from throwing around impressive numbers.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
This sounds similar to a program I used to work on that estimated haplotypes from incomplete DNA sequencing data. The technique is called Expectation Maximization. It is not easy to understand, and it is not easy to debug. I didn't write the code, but did have to fix it. You need a lot of domain knowledge about DNA to understand the code. The algorithm did converge on an answer even with the bug I eventually fixed, just slower.
I'm guessing the program could be more accurate if the parents DNA were available. What would be the legal ramifications of asking them for a sample?
Sorry I couldn't help myself. I figured you didn't read the article, and the ridiculous claims TrueAllele made. Human error for DNA testing has been measured to be around 1 in 200, so these tiny probabilities are just dangerous theatrics. Still it's an interesting challenge to estimate extreme probability values. I was half hoping you'd shut me up with some nice technical way around the problem...
As for empirical testing, it makes sense as part of a larger system of evaluation. Looks like they have some papers to cover the theory. I don't know if code review would also help, but I see no reason not to allow the defense access.
Chris Mesterharm
I'm a software developer, and I'm frequently not sure beyond a reasonable doubt about software I personally have written, let alone other people's software.
I am also a software developer and I have no doubts while analysing the code I wrote, any other properly-commented/structured code or even a horrible code, but all this assuming that I can invest enough time/effort. This is precisely my whole point since the start (is seriously so difficult to just understand what is clearly written?): analysing code is a less efficient alternative than testing the corresponding program under the most common conditions and certainly when dealing with a so complex piece of software like the one being referred here. That's why the first title: "it makes more sense theoretically than practically".
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Actually, GP is correct if we're resorting to empirical testing.
Not even in that scenario. Even in case that you carried those 211 quintillion tests out, it wouldn't represent a reliable validation of the claim "1 in 211 quintillion" because just one empirical confirmation isn't statically significant (and this is, from the point of view of that claim, what performing the whole 211 quintillion test once would mean). If you want to go down such a ridiculous unnecessarily over-working path and you want to do it properly, you would have to rely on a much better methodology on the lines of repeating the process various times (at least, 5 times?) and averaging the value. So, if you perform the 211 quintillion tests 5 times and each of these times you get only 1 error, then you would certainly be in a position to undoubtedly conclude that the original statement was, beyond any doubt, accurate. But nobody in their right mind would ever tried to do such a nonsense to validate a meaning-nothing commercial nonsense.
We would want about six hundred quintillion samples to test against to verify that.
This is not what the intended verification was meant to be. And in any case, this isn't how you would even validate that claim. That quintillion reference is clearly an extrapolated estimation (= commercial language) which could be confirmed/dismissed by relying on equivalent means; that is, testing a much smaller number of samples and applying whatever "methodology" they used to come up with that number. But again this isn't what releasing the source code/not is about; what we are discussing here is about making sure that the piece of software works as expected and, eventually, accurately calculate its actual reliability according to whatever expectations the given court/governmental entity/legislation considers that are good enough; this isn't about confirming whatever random claim the company does.
TL;DR: the ridiculous claim of that company is irrelevant from the software validation/source code release point of view; but, even in case of deciding to empirically validate such a nonsense, the proceeding proposed by the previous poster isn't reliable enough.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
I was half hoping you'd shut me up with some nice technical way around the problem...
Impressive 180-turn attitude change! Well, as answered to other commentator right now, I am personally a fan of approaches on the lines of multiple attempts + averaging the results for proper empirical validation. For example, a way to confirm/dismiss/improve that much more realistic 1 in 200 estimate, I would go with 10 sets of tests up to either 200 or the second error. So, if in the first set, you get the second error at the 150 attempt, you stop there; if in the second set, you reach 200 without a second error, you stop there, etc. You average all these results and get your conclusion. Then, you should repeat that process quite a few more times under different conditions and keep averaging the results for an increasingly better accuracy. But you should also make an extra-effort to not mix up different conditions (or, at least, properly weighting them; although this is usually a more complicated alternative), what might inadvertently affect the reliability in a very relevant way. The whole system could also be systematically further tuned via replacing that initial 200 limit with the newly validated conclusions you keep getting. So, basically an iterative ad infinitum proceeding whose accuracy is mostly conditioned by the time/effort you want to spend on it, but which can also deliver as many (reasonably good) intermediate conclusions as you want.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Good idea. If we assume these are independent trials then it's much more feasible :) We can even do more than two people. An
experiment could be you got the perps DNA and a mix of 5 other
samples. Now can you detect whether or not the perp is in the mix.
Also I'm not worried about the amount of blood. Since we are assuming
the trials are independent, we can tolerate some experimental death.
I'm more worried about the time. Still it's probably doable with some
robotic assistance and is much faster than colonizing the Milky Way.
(In all fairness, colonizing the Milky Way has other benefits.)
Yes, it seems they have some papers, which as you point out, is still worthless. Human error is going to completely dominate. My favorite claim is that will allow the defense to look at their code if they are paid money at an hourly rate. These guys are some impressive assholes.
Chris Mesterharm
It's simple, if completely and totally impractical. There's a claim that a false positive will happen once in 211 quinttillion times. In another Universe, we could run 211 quintillion tests, and if this were the case we'd be looking at a Poisson distribution with lambda of 1. Obviously, that's not good enough. We need many more tests. We can't potentially test enough to make sure the probability is one in 211 quintillion times, but 211 quintillion really means between 210.5 and 211.5 quintillion, and it's philosophically possible to run enough tests to have any desired confidence that the real probability is in that range. I'm not going to bother to compute how many.
There are no practical equivalent means. As you say, the estimate is extrapolation from a far smaller number of what we really hope are competently run tests. It is possible to dismiss the claim given those tests, but it's not possible to confirm it. This is the real world, and the real world is messy. Suppose the method was absolutely perfect and they ran a million tests. Now, consider that there may be a one in a billion chance that there would be some sort of unnoticed contamination of the sample, or an undetectable failure of the device, that would create a false positive. That one in a billion chance would be exceedingly unlikely to turn up in the million tests (this can be treated as a Poisson distribution with lambda of 0.001), and it would mean the company is off by eleven orders of magnitude. We know nothing about differences in human physiology with a confidence of 1 minus a 211-quntillionth, so we can't reason from that.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
If there's a documented algorithm (and there darn well should be) and the code is deliberately written to clearly implement the algorithm (which it probably isn't), code analysis could be useful as a way of verifying it. Otherwise, the only thing source code analysis can say is that it's unsuited for forensics.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
If my choices have no consequences, why bother?
Depends what you mean by consequence. In the current discussion if we're talking about societal consequence rather than basic physics, then I think the answer is obvious -- there are lots of things you may wish to do privately that nobody ever need know about, thus avoiding the issue of societal consequence entirely.
If my choices can have consequences I like, then they can have consequences I don't like, if only by comparison.
They can. However you seem to be treating it as a binary choice. In reality I may freely accept some negative consequences, but not any negative consequences that can be imagined.
To me it seems pretty obvious that the degree to which you can claim a societal freedom is directly related to the degree to which you can avoid societal consequences if desired. If I can say most things without negative consequence, but not some things, then I mostly have freedom of speech. If I can say anything I like without negative consequence, then I have absolute freedom of speech. If I can only say certain things in certain situations, I have little freedom of speech. If I have the ability to easily make anonymous speech then I have more freedom of speech than if the tools of anonymous speech are prohibited.
I mean isn't that obvious? What useful definition of "societal freedom of speech" do you have that contradicts that?