This Impenetrable Program Is Transforming How Courts Treat DNA Evidence (wired.com)
mirandakatz writes: Probabilistic genotyping is a type of DNA testing that's becoming increasingly popular in courtrooms: It uses complex mathematical formulas to examine the statistical likelihood that a certain genotype comes from one individual over another, and it can work with the subtlest traces of DNA. At Backchannel, Jessica Pishko looks at one company that's caught criminal justice advocates' attention: Cybergenetics, which sells a probabilistic genotyping program called TrueAllele -- and that refuses to reveal its source code. As Pishko notes, some legal experts are arguing that Trueallele revealing its source code 'is necessary in order to properly evaluate the technology. In fact, they say, justice from an unknown algorithm is no justice at all.'
about the code!
I think it is very reasonable to ask access, covered by NDA, to a source code when such code is used to produce results for criminal prosecution. Unless they can show independent third-party validation of their tool.
We have seen issues with red light cameras, we have seen issues with labs doing drug testing on hair, we have seen child abuse panics from psychology "experts". Both methods and experts have to be open for independent, impartial validation. Otherwise they are no better than a duck test.
Can you give an exaple of one such code?
As Terry Pratchett wrote somewhere: "Evidence means 'that what is seen'". Nuff said.
Paaia
A lot of expert witness testimony comes down to a judgement call -- "In your opinion, as someone who has been working in this field for 20 years, how confident are you that these signatures / bullet marks / fingerprints / DNA match?" That's the result of an algorithm that you can't examine either, and has at least as much opportunity for being corrupted by unconscious prejudice or outright bribery as a piece of software.
TCP: Why the Internet is full of SYN.
"justice from an unknown algorithm is no justice at all"
...
A successful conviction may be legitimately tipped by accurate checked evidence, in this case DNA
But justice is not a matter of technical facticity. It is withholding something from a party that they deserve.
The evidence may help identify discrepancies between the two, but it is a major conflation to substitute that with justice.
Jurors and judges need to know what the probabilities are. Remember, in a criminal trial, the standard for evidence is "beyond a reasonable doubt." Sending people to prison for life or even to death row based on flimsy evidence is unacceptable.
This isn't to say that it hasn't happened before -- Cameron Todd Willingham was executed in Texas on the testimony of an "arson expert" with no formal training in the field.
The code should be evaluated or the tool should be banned from court. The company doesn't like it? Too bad. They don't have to sell to the forensic lab/law enforcement market.
You have the right to face your accuser, which includes examining the evidence against you. This is secret evidence. It amounts to "because we say so", and should not be tolerated.
A software bug you're not permitted to look for could send you to jail. At least with a human expert witness you can cross-examine them.
One thing is having access to the source code and a completely different story is properly analysing it. When dealing with something as complex as (probabilistic!) DNA sequencing, it seems quite clear that the most sensible way to validate the program is actually using it. Set up a proper benchmark with a relevant number of samples and confirm whether this (+ any other) program works exactly as expected. This would also be an excellent way to objectively assess its accuracy.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Expert judgement can be countered by other experts. Here we are being presented with something as a "Fact". There is no way to dispute it and there is no way to verify it which is what people are having a problem with.
One thing is having access to the source code and a completely different story is properly analysing it. When dealing with something as complex as (probabilistic!) DNA sequencing, it seems quite clear that the most sensible way to validate the program is actually using it. Set up a proper benchmark with a relevant number of samples and confirm whether this (+ any other) program works exactly as expected. This would also be an excellent way to objectively assess its accuracy.
Exactly this. My kingdom for modpoints.
You don't test software by looking at the code. You test the software by testing it. If it ain't broken, you're not testing hard enough.
While I'm very pro-OSS, I'm anti forcing private companies to disclose their source code. It is their work, their intellectual property. It's up to the judge to admit the closed-source evidence and up to the jury to weigh it.
I'm not a complete idiot... Some parts are missing.
Expert judgement can be countered by other experts. Here we are being presented with something as a "Fact". There is no way to dispute it and there is no way to verify it which is what people are having a problem with.
Questioning expert's qualifications is fair game in trials. If you can demonstrate that expert is not impartial, you can largely mitigate their testimony.
How do you question algorithm like if (1) = Guilty; other than code review?
Please point to this model and indicate where it shows your precise location will be under 10' of water by now. Either that or admit you are full of shit and a liar.
A lack of evidence of this "model" will indicate you are a liar whether you respond or not.
Like our current justice system it probably just looks for black genes.
Well it shouldn't be accepted as fact. Ideally the courts would instruct the jury to treat the software's output as similar to a human being saying, "This is my expert opinion." You can submit your own software's "opinion" as evidence as much as you can get your own expert human to testify on your behalf.
It is true that you can't cross-examine it; but ideally, that should make the software less reliable. If you had an expert who, upon cross-examination, always responded, "I don't know, it just seems that way", then he wouldn't have much credibility. Ideally, software that can't justify its "opinion" should be treated the same way.
I have said "ideally" here several times, recognizing that it may well be the case that this isn't how people actually think. But I think a more constructive response to this misplaced trust is to help inform courts and defense lawyers more clearly (who should in turn inform the juries).
TCP: Why the Internet is full of SYN.
You get to ask an expert witness why their opinion is what it is, and if they answer "I'm not telling," their credibility is shot and there's a good chance their testimony will be thrown out. This software is an expert witness that nobody has any reason to believe giving testimony damning a person and then refusing to explain why but maintaining credibility. Analyzing whatever algorithm the software uses would be like questioning the witness, which is your right as a defendant in the USA, and keeping it hidden is literally denying you that right.
I've done some genomic work, during the Human Genome Project. I had to step away from the work due to my concerns about the lack of quality. The analysis software of the data, to assemble longer genesic fragements for testing and verification, was so very very poor that all the scientists learned to ignore the analysis and order longer sequence manually, by eyeballing it with their personal experience. It was hideously expensive to do this constantly, especially with the amount of sequences to sample and test and which came back "does not work". Part of the result was that, because they were probing in the dark, they got far more false positives that had to be tested later, as part of an even longer or overlapping sequence, that even *that* data was unreliable.
We have *had* crime labs falsify evidence, with cases like https://www.cbsnews.com/news/a... . Without the ability to verify the provenance of the data, of the results, and of the analysis tools, the DNA analysis can be far too easy to falsify. It should be as verifiable as the scales used to measure the weight of drugs, or the spectrographic analyzer and its software.
I'm anti forcing private companies to disclose their source code
In some cases, seeing the source code might be required, but under the most likely conditions this is a pretty useless formality. Very tough work which is very unlikely to output worthier conclusions than testing.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
So how would you determine the code only works as it does when being tested? You could easily have a trigger in the code to give a guilty verdict when desired, and never use it during testing.
"Well of course they're guilty, our code has been tested!"
Only an expert in the field is going to have a chance of understanding the algorithm.
That expert is either going to work for this company or their competitors.
They should only need to publish their valuation tests and the results.
What you're describing is exactly why experts frequently have their credibility challenged and why they need to provide the means by which to verify their credentials. The problem here is that they're providing no means by which to establish or confirm the credibility of the algorithm, and they know that doing so doesn't harm them as it would with an expert witness.
Imagine if the prosecution put an "expert" on the stand who testified how the prosecution wanted, but when the defense attorney asked where the "expert" went to school, where they worked, or how long they had been practicing, the "expert" refused to answer those questions and instead asked the jury to simply trust their "expert" opinion. They'd be laughed out of the court room, since the jury wouldn't know whether the "expert" was actually an expert or just a guy off the street. And that's how it should be.
Unfortunately, refusing to provide a means by which the credibility of this algorithm can be ascertained doesn't elicit the same response. Machines are commonly viewed as unbiased, logical, and factual, so while a human's refusal to allow their credentials to be verifiedwould be an immediate red flag, with a machine it doesn't mean much to most people. People are accustomed to thinking that algorithms produce factual results that can be taken at face value.
That's a problem when it comes to things that need to be verifiable, whether it's evidence in court or votes in an election.
You could easily have a trigger in the code to give a guilty verdict when desired
The problem is that, in a complex enough code, you might not be able to tell even by looking at the source code. Theoretically, you certainly could, but practically nobody would spend all the required effort to gain a perfect understanding. Here, for example, a probabilistic-based DNA sequencing approach! I am tired just with thinking about how intrincated and obscure that code might be! The calculation engine might be formed by walls of constants and complex formulae, which are extremely difficult to be analysed and which might carry any faulty bit. To not mention the alternative of "what if the code analyst also wants to trick you"? The most practical (and certainly used everywhere) approach is to properly test the corresponding piece of software and, eventually, take additional measurements like having a proper knowledge about the developing company.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
How is this for you?? Al Gore claimed that coastal cities would be 20 feet under water in the near future in his movie 'An Inconvenient Truth'
http://scienceline.org/2008/12/ask-rettner-sea-level-rise-al-gore-an-inconvenient-truth/
Code for "facts" used in the courtroom hidden? Oh, you mean like how voting machine software and hardware design is often not available to the public for examination. All of it, anything on which democracy is contingent, needs to be published. No ifs, ands, or buts. Probably also applies to the code used in killer bots. The populace will need to know how a kill decision is made.
if ( isDarkie() || ( isPoor() && Math.random() > .5 ))
return "GUILTY";
else
return "NOT GUILTY";
It is different in that you can challenge an expert witness with your own witness. How can you challenge an algorithm that no one really knows? Considering that the FBI has used flawed statistics in DNA matching for a decade, this is not the first time that there are issues with how forensic science is done.
Well, there's spam egg sausage and spam, that's not got much spam in it.
Breathalyzers are effectively closed source under trade secret protections and we've convicted lots of people with those.
As a programmer I can assure you that I am infallible and perfect. My superiority is the reason I am a programmer and most people are not.
“Common sense is not so common.” — Voltaire
DNA probabilistic methods like this can do 3 things but can only be use to do one of them at a time. They can eliminate an accused, they can can eliminate all but one person from a predetermined sample of people to find the guilty person, or they can give the police a potential list of suspects. They CANNOT be used to do both of the last two. If I have a small partial DNA sample there will be multiple people in the world that it will match. If the police then just round up the first person that they find who matches and say oh the probability of a match this close is one in 300 million. Well no, if there were 300 million permutations and you looked in a population of 300 million people I would expect you to find a match (well at least 1 -1/e times) .
The problem is that, in a complex enough code, you might not be able to tell even by looking at the source code.
Even simple programs can be unreadable.
And malicious intent isn't annouced with a comment of 'backdoor access here'.
Arguably, the program can be evaluated without the source code.
At considerable expense. But even then it still is a problem because you would have to do it for every single case. Otherwise you have no way to know if something is different or wrong with the analysis in a case where no verification was conducted.
Simply use known samples and examine the output. Do the results of the analysis match what was known about the samples?
You're talking about using controls and/or independent testing methods. Not really good enough because if there is a discrepancy you run into Segal's Law (a man with a watch knows the time and a man with two is never sure). You have no way to know which test (if either) is the correct one. You would need to do those sorts of independent verification but you still cannot really accept any analysis in a court of law where the defendant cannot evaluate the methodology used to accuse them.
You're asking how to question an algorithm that assigns the value held in 'Guilty' to the first (second?) element of the array 'if' (I assume the word 'if' is not a reserved word.)
You don't test software by looking at the code. You test the software by testing it. If it ain't broken, you're not testing hard enough.
Doing a black box analysis of software when the code should be available for review by a defendant is so wrong headed I barely know where to start There is NO place for secret code when it comes to convicting people of crimes. The defendant should be able to question any and all methods being used to accuse them of a crime.
While I'm very pro-OSS, I'm anti forcing private companies to disclose their source code.
Tell me that when you are facing a life sentence and you aren't allowed to examine the code being used to send you to jail. If we're talking about a word processor, who cares but when we're talking about felony convictions for crimes I see no value to society in companies being allowed to keep such code private.
Actually, what spews out of these programs, and is presented in court as incontrovertible "mathemagical" evidence is a statement like "The likelihood that this degree of match could accord by chance is 1 in 14 trillion."
Meanwhile, the reality is that there is not enough data to support any such claim, because the actual statistical distribution is unknown, and the claim is based on flimsy assumptions, assumptions made in the "theory" behind the possibly buggy code, the code that you can't inspect.
Gotta be able to face your accuser...
Well it's only be ~10 years since the Inconvenient Truth. I'm not sure what "near future" mean. If that means 1 year to you or 50 years to the rest of us. (I think his film was aiming at the year 2100, but I don't recall exactly).
Here's an example, some islands are completely covered by water at high tide. http://theconversation.com/sea...
The most up to date information has projections range from 0.2 meters to 2.0 meters (0.66 to 6.6 feet) of sea level rise in the next 100 years. [Melillo et al., 2014]. And that's the thing about science, you'll find that it is never 100% accurate and if you look back to previous theories and predictions can be embarrassingly inaccurate. But the scientific method generally leads to better answers through many iterations of models, research and theories.
Al Gore's 20 feet rise greatly exceeds the most conservative models, as you've already noted. On the other hand if all the ice covering Antarctica, Greenland, and in mountain glaciers around the world were to melt, sea level would rise about 70 meters (230 feet). That's the far extreme of what could be done with the matter available on Earth, it's not at all likely. (maybe if the Earth's axis tilted to expose the poles? Or maybe if 10's of thousands of years went by and we acquired an atmosphere like Venus that make air temperature nearly uniform across the planet, including the poles?)
“Common sense is not so common.” — Voltaire
Well it shouldn't be accepted as fact. Ideally the courts would instruct the jury to treat the software's output as similar to a human being saying, "This is my expert opinion." You can submit your own software's "opinion" as evidence as much as you can get your own expert human to testify on your behalf.
One of the requirements for presenting expert testimony is that you have to provide all of the materials that the expert used in forming their opinion. If the results of some software were treated as an expert opinion, the "materials relied upon" would almost certainly include the source code. It may even make the programmers, as the source of those materials, subject to being deposed about how they developed the software.
I'm anti forcing private companies to disclose their source code.
They don't have to disclose their source code. They can choose instead to have it not be usable in court.
Freedom of choice does not mean freedom from consequences.
They are asking not only for the source code but also for the algorithms behind it. It is much easier to evaluate the code once you know how it is supposed to work. In fact the algorithm and the math behind the code is what should be examined. The problem of untangling a mixture of DNA samples with levels close to the detection limit, as is in the case they discuss, is exceedingly complex. At that level the tests are highly prone to amplify a contamination (there was a case when somebody contaminated the samples with their own DNA because the tubes were opened while they talked). At the detection limit of the assay you also have stochastic effects where random alleles that are present in the sample are not detected. This is just talking about "single source" samples. Now add unknown number of sources of DNA to the mixture, every one of them with different amount and state of decay. I can easily imagine mixtures that cannot be unambiguously solved under ideal conditions. The claim that they can do that on degraded samples from multiple contributors, some of which may be relatives, at the limit of detection is one that requires extraordinary proof. I also buy the argument that revealing the code will infringe on his right to protect his IP. He can easily use patent protection instead of trade secret, which would allow examination of the science while protecting his IP.
This doesn't account for edge cases or deliberate tampering.
What if one of the programmers was of a malicious type who hated his ex-wife to the point where he would code a special routine in if her DNA were found (or if his own were found)?
How is this for you?? Al Gore claimed that coastal cities would be 20 feet under water in the near future in his movie 'An Inconvenient Truth'
A politician spouting off his opinion is not a "scientific model".
Can you point to any climate model, peer reviewed and published, that predicted a 10 foot ocean rise by 2017? No? How about a one foot rise by 2020? No? Anything?
Exactly. In 'An Inconvenient Truth he showed the affects of a 20 foot sea level rise on various bits of the UK
http://www.global-warming-trut...
Impact of 20 Foot Rise in Sea Level
In 1992 they measured this amount of melting in Greenland. 10 years later this is what happened. And here is the melting from 2005. Tony Blair's scientific advisor has said that because of what is happening in Greenland right now, the map of the world will have to be redrawn. If Greenland broke up and melted, or if half of Greenland and half of West Antarctica broke up and melted, this is what would happen to the sea level in Florida.
Global Warming induced sea rise fffect on Florida
This is what would happen in the San Francisco Bay.
A lot of people live in these areas. The Netherlands, the low-countries: absolutely devastating.
https://www.ipcc.ch/publicatio...
The instrumental record of modern sea level change shows evidence for onset of sea level rise during the 19th century. Estimates for the 20th century show that global average sea level rose at a rate of about 1.7 mm/yr.
Now at 1.7 mm per year 20 foot or around 6000mm of sea rise would take 3500 years! Not to mention he's being disingenuous with the Netherlands. The Netherlands isn't just 'low lying', big chunks of it are actual below sea level. They've built protective earthworks and sea walls to stop the sea coming in. If the sea level rises by 1.7mm per year they'll just need to plan to raise the height of the sea walls by on average that much plus some safety factor.
Al Gore is a lot of things, but he's not an idiot. He must know that showing Google Maps of NYC now mostly flooded by a 20 foot sea level rise when that rise will happen over 3500 years is dishonest. Presumably he thinks being dishonest about this is morally justified because it will get people to make changes he believes they need to make anyway. Still his motives are not pure. He bet big time by investing in a bunch of companies who'd benefit from things like emissions trading. If it doesn't happen, those companies will disappear. He'lll still be richer than Crassus of course, but not as rich as if people followed his policy recommendations.
The NYT is pretty pro Democrat but even they pointed out that he has a conflict of interest
http://www.nytimes.com/2009/11...
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
Justice isn't about impartiality or facts. That's not its job; that's science's job. Justice's job is to regulate society so we can all get along. If some of us have to be sacrificed, so be it. Society needs a degree of certainty in order to function.
If we all agree on something--and if they courts say we agree, then we agree--then we have certainty. The case can be resolved, the guilty punished, and society can move on.
The courts do NOT, and have not for a very long time if ever, had the patience or resources to give everyone accused of a crime "a speedy and public trial, by an impartial jury of the State and district wherein the crime shall have been committed, which district shall have been previously ascertained by law, and to be informed of the nature and cause of the accusation; to be confronted with the witnesses against him; to have compulsory process for obtaining witnesses in his favor, and to have the Assistance of Counsel for his defence."
Do you seriously believe every dope dealer, thief, rapist, etc. is entitled to that? Yes, it was promised to you a long time ago, but you've been living in a cave if you really expect it when you show up.
Take the plea, do the time, pay the fine, and move on. You are guilty. If you drag it to a trial you've already pissed off all the other people in the room with the possible exception of your own lawyer. Don't look to them for help.
There are people who care about the quality of the facts that appear in court, but only in the abstract. Google "forensics on trial" and follow your nose. These people have about the same appeal to the process as any other scientist: Lawyers and the law are only interested in "facts" when they agree with theory; this is not a character defect; it's the nature of an adversarial legal system. It's supposed to be that way.
The concept that science should be valid in court is not important--only that it is *accepted* (by the court) and that it proves *my* point. Or at least gives a quick answer so we can all get outta here. (well, except for the guilty).
The "innocence" project is not called the Justice project. They're just as adversarial as any other legal organization.
Face it, folks--it's like a no-longer-mentionable comedian said:
"Mama doesn't want justice. Mama wants quiet!"
"Reality is that which, when you stop believing in it, doesn't go away." - Philip K. Dick
The NYT is pretty pro Democrat but even they pointed out that he has a conflict of interest
What are you talking about the New York Times for? Why not go straight to the source? Al Gore has stated it outright that he has invested with the express knowledge and belief as to what is most profitable for him, based on what he thinks will happen.
Now if he were to travel to the Hubble Space Telescope in an attempt to melt the ice caps, or installed a giant sun-blocking shield over Springfield, then you might have something, but he's straight-out spouting that it's a profit-motive for him.
Which is especially odd when the criticism comes from the Capitalists Apologists who defend their own with claims of "It's the most profitable way" and "Greed is good" and "It's your fault for letting yourself be fooled" among other mantras.
You're applying syntax rules to pseudocode.
... Whooosssshhhhhhhh....
"As far as we know, our computer has never had an undetected error."
What if one of the programmers was of a malicious type who hated his ex-wife to the point where he would code a special routine in if her DNA were found (or if his own were found)?
Your chances of finding out about that via testing are way higher than via code analysis. Even in case of having the source code, it is very unlikely that a so complex piece of software is properly analysed.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
I'm anti forcing private companies to disclose their source code.
How do you feel about voting machines? ;)
Years ago I used to work for an independent testing lab -- we got to see (under NDA) the source code for (some) voting machines. Also the hardware. We went through that software line-by-line looking for things that were contrary to FEC standards. (The hardware was similarly evaluated.) We recompiled the source and validated it against the distributed binaries. We found a lot of questionable lines of code, although mostly trivial stuff about insufficient documentation of inputs and outputs, but occasionally stuff like use of uninitialized variables or questionable coding practices. (In languages including C, C++, PL/I, even COBOL. gagh!) Don't remember all the details, it's was six or seven years ago.
That being said, we were looking for a specific list of issues. There could well have been crap in there that a sneaky-enough programmer could have made unobvious.
But the source code of something being used in law enforcement should undergo at least such inspection/testing. It's not disclosing source code (except under NDA), it's submitting it to independent verification.
Gore is a rent seeking scumbag who disguises his rent seeking as environmentalism. You can be a capitalist and dislike rent seeking, and think the solution to that is to reduce the number of areas the government regulates and hence the opportunities for rent seeking. That's essentially the free market position.
https://en.wikipedia.org/wiki/...
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
You brought up a good point.
So the counter would be to write a program that accepted the same physical evidence data and simply returned whatever answer the defense wants.
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
Freedom of choice does not mean freedom from consequences.
That is what it means actually. Well, more specifically it's about being free from consequences that you don't wish to be subjected to. If you don't have freedom from consequences that you don't want then it's meaningless because you're really talking about free will, not societal freedom. "If you do drugs, we'll throw you in jail as a consequence! We support your freedom of choice in doing drugs!" That isn't useful.
but also for the algorithms behind it.
With proper help, analysing the code is certainly easier but, if the original developers seriously wanted to hide something in a so complex piece of software, your chances of finding it via code analysis would be extremely low.
In fact the algorithm and the math behind the code is what should be examined
The underlying theory and the provided documentation are the worst parts to start looking for fishy bits. If they want to do something not too correct, they would hide it pretty well and, logically, don't tell you about it.
The problem of untangling a mixture of DNA samples with levels close to the detection limit, as is in the case they discuss, is exceedingly complex. At that level the tests are highly prone to amplify a contamination (there was a case when somebody contaminated the samples with their own DNA because the tubes were opened while they talked). At the detection limit of the assay you also have stochastic effects where random alleles that are present in the sample are not detected.
Are you saying that you cannot validate a DNA-analysing piece of software? How could that be true? Any piece of software can be validated. You have X input samples and Y expected results, if the program outputs the right result with a Z level of error is fine, otherwise is not. Whatever contamination or additional aspect should be possible to be removed, otherwise how are you expecting to use a so unreliable software/proceeding in court?
Now add unknown number of sources of DNA to the mixture, every one of them with different amount and state of decay. I can easily imagine mixtures that cannot be unambiguously solved under ideal conditions.
You have two options: either remove those cases from the tests or carefully analyse whatever output properly and determine whether it might be assumed correct. You have to be able to know what answer you expect either manually or by using an already-validated piece of software and, within certain confidence, determine whether the tested piece of software passes the test or not. If you do a proper test, with a big enough of proper samples and a proper assessing methodology, the pieces of software working fine should pass that test. Additionally, if the test results are so extremely difficult to be validated, how are you expecting to deal with the order of magnitude more difficult to analyse source code? At least, by assuming that plan to do it properly; just reading some basic ideas about its underlying algorithm would certainly be much easier.
I also buy the argument that revealing the code will infringe on his right to protect his IP
Note that my point is based on pure pragmatism, rather than on privacy/IP aspects.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
One of first cases covered by Groklaw, and by Slashdot too at the time. The code finally got subpoenaed, and it was SO bad that IIRC the manufacturer went out of business as the result.
If some incompetent defence lawyers let it slide unchallenged, it absolutely does not let manufacturers of such crap off the hook. The precedent is there.
You don't test software by looking at the code. You test the software by testing it. If it ain't broken, you're not testing hard enough.
But you use the code to find interesting boundary cases that need additional scrutiny in testing!
To properly test software *requires* access to source. Otherwise all you're doing is poking it with a stick to find vulnerabilities.
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
We've had the "source code" (protocols) for a lot of forensic techniques that later turned out to be crap. We don't need the source code; nobody can tell from source code whether a forensic procedure actually works reliably. To determine that, you simply need to perform double blind testing and run a large number of control experiments with every forensic test. That's true for all forensic tests, not just TrueAllele.
Enough.
Hell, PBS Frontline did a special about the horrors of modern "forensics", titled 'The Real CSI'
It's an eye-opener.
An enigma, wrapped in a riddle, shrouded in bacon and cheese
Not even close. Its called idiom; its allowed to be imprecise and not intended to be evaluated. The same point remains without any regard to the content of the code.
I bet you are a blast at parties.
Gore is a rent seeking scumbag who disguises his rent seeking as environmentalism. You can be a capitalist and dislike rent seeking, and think the solution to that is to reduce the number of areas the government regulates and hence the opportunities for rent seeking. That's essentially the free market position.
https://en.wikipedia.org/wiki/...
This means global warming is fake, correct?
DNA is just a social construct It has no basis in reality
No. Personally I think global warming is happening, but it's not all that serious, i.e. I'm a lukewarmer like Matt Ridley
https://www.thegwpf.org/matt-r...
These days there is a legion of well paid climate spin doctors. Their job is to keep the debate binary: either you believe climate change is real and dangerous or you're a denier who thinks it's a hoax.
But there's a third possibility they refuse to acknowledge: that it's real but not dangerous. That's what I mean by lukewarming, and I think it is by far the most likely prognosis.
I am not claiming that carbon dioxide is not a greenhouse gas; it is.
I am not saying that its concentration in the atmosphere is not increasing; it is.
I am not saying the main cause of that increase is not the burning of fossil fuels; it is.
I am not saying the climate does not change; it does.
I am not saying that the atmosphere is not warmer today than it was 50 or 100 years ago; it is.
And I am not saying that carbon dioxide emissions are not likely to have caused some (probably more than half) of the warming since 1950.
I agree with the consensus on all these points.
I am not in any sense a "denier", that unpleasant, modern term of abuse for blasphemers against the climate dogma, though the Guardian and New Scientist never let the facts get in the way of their prejudices on such matters. I am a lukewarmer.
Being a lukewarmer is perfectly consistent with the consensus. Ironically people saying that we'll get 20 feet of sea rise in our lifetimes are saying something inconsistent with the consensus. They're the deniers, not the lukewarmers. And actually if you look at experimental measurements of temperature models, they show warming happening slower than the IPCC's models.
https://imgur.com/a/WWeun
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
And that's the thing about science, you'll find that it is never 100% accurate and if you look back to previous theories and predictions can be embarrassingly inaccurate.
The point is, when someone makes 99 consecutive "embarrasingly inaccurate" predictions, you would be a fool for believing the 100th one, particularly if that prediction has a timeframe of 100 years and has a predictive window of one magnitude; "0.2 meters to 2.0 meters (0.66 to 6.6 feet) of sea level rise in the next 100 years". Even politicians don't make promises as slippery as that.
Odd how some scientists are taken for their intent but their critics are held at their word.
The truth is that scientific models that fail in their predictions aren't scientific. 73 models that fail in their predictions indicate an unscientific field.
>It uses complex mathematical formulas to examine the statistical likelihood that a certain genotype comes from one individual over another,
Complex? Nope. As a bioinformatician, I can say for sure this is a very simple thing done with very basic statistics.
So the article claims a false positive rate of 1 in 211 quintillion for a particular trial. To test that with a 95% confidence interval we would need at least 600 quintillion samples. Now we're a bit short on people on this planet. I don't think Earth could support this many people so we need to colonize other planets. To make things simple, lets assume the average planet can support 10 billion people. Therefore, we need to colonize roughly 60 billion planets and test everyone on those planets. I think we can do that without leaving the Milky Way galaxy, so we should be OK.
Chris Mesterharm
As stated, seems useful for investigation/obtaining warrant, and accuracy can be confirmed with blackbox quality assurance. On the other hand, I would refrain from using as "star evidence." That said, if I were in there shoes, I would get a my lawyers to draft up a nice pair of NDAs, get a respected university to verify the science, and get a security company to review the coding to get a pair of gold stars. It might make DA's and investigators a bit more likely to take a look.
"If you do drugs, we'll throw you in jail as a consequence! We support your freedom of choice in doing drugs!" That isn't useful.
Tell that to Hawaii where they just announced that people in Hawaii that use legalized (within Hawaii) medical marijuana, although Hawaii will not enforce Federal Law regarding marijuana and immigration (sanctuary State), they will enforce Federal Law that prohibits people from legally possessing a firearm who use and addictive and/or Federally-illegal drug, and announced a 'grace period' for such people to turn in their guns to the government without prosecution. "Shall not be infringed" has been ruled to mean no decorative edging treatments are to be applied.
It's also an example of what can happen with firearm registries, and while I'll never register any of my large collection of military firearms.
Smart systems should be able to print a trace of their decision-making. If the code is not accesible, the particular instance of reasoning relevant to your case should at least be scrutinizable this way.
Ezekiel 23:20
The truth
So every one of your "73 predictions" used 1977 as a baseline? Either you found a whole lot of 40 year old "models", or you are full of bullcrap. It is hard to tell since of the 73, exactly this many are actually named or cited: 0.
If I have a small partial DNA sample there will be multiple people in the world that it will match.
No way. Does that mean there are multiple evil twins in the world I've never met?
Guilty til proven innocent, that's the system we need!
So the article claims a false positive rate of 1 in 211 quintillion for a particular trial.
I didn't read the article, but that or any other issue doesn't change anything. If you aren't able to define accurate enough conditions to validate the corresponding piece of software, you would fail to do so anyway. Testing is much more likely to be quicker and more efficient than the alternative approach of analysing the code. Or do you think that by having access to the code you can guess what might be the output under so extreme conditions? If this was so, what would have been the point of having a piece of software in the first place if just by looking at the algorithm you can intuitively get the result?!
To test that with a 95% confidence interval we would need at least 600 quintillion samples.
No. This is not what the first statement means. And again, if you preferred to interpret it in that way and to analyse a so ridiculously big and completely unnecessary number of samples, I would recommend you to do it by running that software rather than by manually analysing the algorithm.
I don't think Earth could support this many people so we need to colonize other planets. To make things simple, lets assume the average planet can support 10 billion people. Therefore, we need to colonize roughly 60 billion planets and test everyone on those planets. I think we can do that without leaving the Milky Way galaxy, so we should be OK.
Out of all your ridiculous statements so far, this is my favourite one. Are you saying me that if you had to (manually) test 600 quintillion DNA samples, you would get them from 600 quintillion different people?! LOOOOOOOOL. You, this-can-be-solved-by-scaling-it-up guys, are too much for me! So, let's sum up your masterpiece so far:
1. You have a situation about which you clearly don't have even a slight understanding (or perhaps you are being consciously dishonest/partial for whatever reason; because properly understanding all this doesn't seem that difficult for virtually anyone with any kind background).
2. You take a random statement which seems appealing to you/to what you know ("quintillion" sounded nice to you, right?) from that description and interpret it in the most ridiculously wrong way possible.
3. You use that first stupid conclusion as an initial step to continue guessing increasingly stupid problems/solutions: if we need to do X DNA sample tests, we would take it from X different people; if we get out of people, we make more people; if get out of space for those people, we go to other planets, etc. Everything is so easy for you, isn't it? You are a solver! LOL.
4. You aren't able even to finish all that nonsense properly, because I would have been able to come up with a much funnier ending part myself.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Shouldn't actual extensive testing provide enough data to determine the efficacy of such a platform?!
Regardless of what's "under the hood"?
If it absolutely works at least as good as any alternative, then it seems a good tool, no matter how the process is done.
Self-importance and self-indulgence is the root of ALL evil.
Scientific models are scientific. If they turn out to be wrong, that's important information that will be reflected in the next version. Remember that all models are wrong, but some are useful.
Scientists are judged on what they say, since we don't get enough information to discern their intent.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
You are aware, I presume, that the rate of sea level rise in the future could be different from 1.7mm/year? There are good reasons to think that it will increase. Nobody exactly knows how by how much, of course. Current models don't predict anywhere near that rise in that time frame, but it's easy to come up with possible ways for it to happen.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
US Constitution, sixth amendment: "In all criminal prosecutions, the accused shall enjoy the right...to be confronted with the witnesses against him; to have compulsory process for obtaining witnesses in his favor....". It seems to me that a device that announces something should have some humans, i.e. witnesses, testifying in its favor, but the courts may not agree.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Even if the current rate doubled or tripled it wouldn't make much difference. If you look at the UK the government decides what the worst case forecast is, and people maintaining coastal defences plan to deal with that. Last I checked they decided on 3mm per year. Still, people are measuring this sort of thing and those measurements, plus a safety factor, become the future government decision.
Unless you're either building sea walls or trying to forecast, it doesn't matter what the rate is.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
If my choices have no consequences, why bother? If my choices can have consequences I like, then they can have consequences I don't like, if only by comparison. This applies when discussing free will or societal freedom. Freedom from consequences I don't want is perforce ineffectuality.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
That depends heavily on how the software is written. The software can be written to match the algorithm so it's verifiable. It usually isn't, of course, but it would be nice if that were required for forensic software. After all, if we're using this in a court of law, we should be sure past a reasonable doubt that it's valid. I'm a software developer, and I'm frequently not sure beyond a reasonable doubt about software I personally have written, let alone other people's software.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Actually, GP is correct if we're resorting to empirical testing. We would want about six hundred quintillion samples to test against to verify that. To say that the chance is one in 211 quintillion rather than one in 211 quadrillion, which is three orders of magnitude difference, we'd have to have enough testing to show that the error rate was less than one in 211 quadrillion, which means that we'd have to have enough samples so that the failures were significantly less than one in 211 quadrillion. That one we might manage to verify by testing samples from a mere half billion people against each of the other half billion. We leave the problem of getting that much blood out of each test subject as an exercise for the reader.
If the company wants to claim one in 211 quintillion, they need to provide a basis for that belief. To apply a mathematical model to get that number, we'd have to be able to verify the model to that accuracy, and we'd have to make sure all real-world possibilities are accounted for. If there's a one in a trillion chance that accidental contamination of a sample would make it return a false positive, the probability estimate is off by at least eight orders of magnitude.
tl'dr: That probability estimate is completely unfounded, and shows that the company doesn't care about science when it would stop them from throwing around impressive numbers.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
This sounds similar to a program I used to work on that estimated haplotypes from incomplete DNA sequencing data. The technique is called Expectation Maximization. It is not easy to understand, and it is not easy to debug. I didn't write the code, but did have to fix it. You need a lot of domain knowledge about DNA to understand the code. The algorithm did converge on an answer even with the bug I eventually fixed, just slower.
I'm guessing the program could be more accurate if the parents DNA were available. What would be the legal ramifications of asking them for a sample?
Sorry I couldn't help myself. I figured you didn't read the article, and the ridiculous claims TrueAllele made. Human error for DNA testing has been measured to be around 1 in 200, so these tiny probabilities are just dangerous theatrics. Still it's an interesting challenge to estimate extreme probability values. I was half hoping you'd shut me up with some nice technical way around the problem...
As for empirical testing, it makes sense as part of a larger system of evaluation. Looks like they have some papers to cover the theory. I don't know if code review would also help, but I see no reason not to allow the defense access.
Chris Mesterharm
I'm a software developer, and I'm frequently not sure beyond a reasonable doubt about software I personally have written, let alone other people's software.
I am also a software developer and I have no doubts while analysing the code I wrote, any other properly-commented/structured code or even a horrible code, but all this assuming that I can invest enough time/effort. This is precisely my whole point since the start (is seriously so difficult to just understand what is clearly written?): analysing code is a less efficient alternative than testing the corresponding program under the most common conditions and certainly when dealing with a so complex piece of software like the one being referred here. That's why the first title: "it makes more sense theoretically than practically".
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Actually, GP is correct if we're resorting to empirical testing.
Not even in that scenario. Even in case that you carried those 211 quintillion tests out, it wouldn't represent a reliable validation of the claim "1 in 211 quintillion" because just one empirical confirmation isn't statically significant (and this is, from the point of view of that claim, what performing the whole 211 quintillion test once would mean). If you want to go down such a ridiculous unnecessarily over-working path and you want to do it properly, you would have to rely on a much better methodology on the lines of repeating the process various times (at least, 5 times?) and averaging the value. So, if you perform the 211 quintillion tests 5 times and each of these times you get only 1 error, then you would certainly be in a position to undoubtedly conclude that the original statement was, beyond any doubt, accurate. But nobody in their right mind would ever tried to do such a nonsense to validate a meaning-nothing commercial nonsense.
We would want about six hundred quintillion samples to test against to verify that.
This is not what the intended verification was meant to be. And in any case, this isn't how you would even validate that claim. That quintillion reference is clearly an extrapolated estimation (= commercial language) which could be confirmed/dismissed by relying on equivalent means; that is, testing a much smaller number of samples and applying whatever "methodology" they used to come up with that number. But again this isn't what releasing the source code/not is about; what we are discussing here is about making sure that the piece of software works as expected and, eventually, accurately calculate its actual reliability according to whatever expectations the given court/governmental entity/legislation considers that are good enough; this isn't about confirming whatever random claim the company does.
TL;DR: the ridiculous claim of that company is irrelevant from the software validation/source code release point of view; but, even in case of deciding to empirically validate such a nonsense, the proceeding proposed by the previous poster isn't reliable enough.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
I was half hoping you'd shut me up with some nice technical way around the problem...
Impressive 180-turn attitude change! Well, as answered to other commentator right now, I am personally a fan of approaches on the lines of multiple attempts + averaging the results for proper empirical validation. For example, a way to confirm/dismiss/improve that much more realistic 1 in 200 estimate, I would go with 10 sets of tests up to either 200 or the second error. So, if in the first set, you get the second error at the 150 attempt, you stop there; if in the second set, you reach 200 without a second error, you stop there, etc. You average all these results and get your conclusion. Then, you should repeat that process quite a few more times under different conditions and keep averaging the results for an increasingly better accuracy. But you should also make an extra-effort to not mix up different conditions (or, at least, properly weighting them; although this is usually a more complicated alternative), what might inadvertently affect the reliability in a very relevant way. The whole system could also be systematically further tuned via replacing that initial 200 limit with the newly validated conclusions you keep getting. So, basically an iterative ad infinitum proceeding whose accuracy is mostly conditioned by the time/effort you want to spend on it, but which can also deliver as many (reasonably good) intermediate conclusions as you want.
Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
Good idea. If we assume these are independent trials then it's much more feasible :) We can even do more than two people. An
experiment could be you got the perps DNA and a mix of 5 other
samples. Now can you detect whether or not the perp is in the mix.
Also I'm not worried about the amount of blood. Since we are assuming
the trials are independent, we can tolerate some experimental death.
I'm more worried about the time. Still it's probably doable with some
robotic assistance and is much faster than colonizing the Milky Way.
(In all fairness, colonizing the Milky Way has other benefits.)
Yes, it seems they have some papers, which as you point out, is still worthless. Human error is going to completely dominate. My favorite claim is that will allow the defense to look at their code if they are paid money at an hourly rate. These guys are some impressive assholes.
Chris Mesterharm
It's simple, if completely and totally impractical. There's a claim that a false positive will happen once in 211 quinttillion times. In another Universe, we could run 211 quintillion tests, and if this were the case we'd be looking at a Poisson distribution with lambda of 1. Obviously, that's not good enough. We need many more tests. We can't potentially test enough to make sure the probability is one in 211 quintillion times, but 211 quintillion really means between 210.5 and 211.5 quintillion, and it's philosophically possible to run enough tests to have any desired confidence that the real probability is in that range. I'm not going to bother to compute how many.
There are no practical equivalent means. As you say, the estimate is extrapolation from a far smaller number of what we really hope are competently run tests. It is possible to dismiss the claim given those tests, but it's not possible to confirm it. This is the real world, and the real world is messy. Suppose the method was absolutely perfect and they ran a million tests. Now, consider that there may be a one in a billion chance that there would be some sort of unnoticed contamination of the sample, or an undetectable failure of the device, that would create a false positive. That one in a billion chance would be exceedingly unlikely to turn up in the million tests (this can be treated as a Poisson distribution with lambda of 0.001), and it would mean the company is off by eleven orders of magnitude. We know nothing about differences in human physiology with a confidence of 1 minus a 211-quntillionth, so we can't reason from that.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
If there's a documented algorithm (and there darn well should be) and the code is deliberately written to clearly implement the algorithm (which it probably isn't), code analysis could be useful as a way of verifying it. Otherwise, the only thing source code analysis can say is that it's unsuited for forensics.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
The point is, when someone makes 99 consecutive "embarrasingly inaccurate" predictions, you would be a fool for believing the 100th one
Agreed. but that has not happened.
particularly if that prediction has a timeframe of 100 years and has a predictive window of one magnitude; "0.2 meters to 2.0 meters (0.66 to 6.6 feet) of sea level rise in the next 100 years".
I fail to see how 0.2 m to 2.0 m is not a reasonable range. If it is backed by models that others can evaluate and reproduce, that makes it science.
If policy makers are expected to act on every research paper that flows out of the science rags, we'd probably never get anything done. But if there are multiple confirmations on a general trend and scientists are at the point of debating the details then we can seriously consider making policy. We're at that point now, and have been beyond that point for 5-10 years depending on who you ask.
What's amusing to me is that policy makers use science frequently. Even if the research is incomplete. Take a look at the court system's use of DNA and fingerprinting. We continue to find problems with how this is done, but the law has ruled that it is fact even if the science now says it is not quite so black and white.
I think policy makers disregard some science and embrace other science out of political convenience. I would recommend you take a heavy dose of scepticism on anything you hear from right-leaning or left-leaning politically charged talk show hosts have to say on the subject of science.
“Common sense is not so common.” — Voltaire
If my choices have no consequences, why bother?
Depends what you mean by consequence. In the current discussion if we're talking about societal consequence rather than basic physics, then I think the answer is obvious -- there are lots of things you may wish to do privately that nobody ever need know about, thus avoiding the issue of societal consequence entirely.
If my choices can have consequences I like, then they can have consequences I don't like, if only by comparison.
They can. However you seem to be treating it as a binary choice. In reality I may freely accept some negative consequences, but not any negative consequences that can be imagined.
To me it seems pretty obvious that the degree to which you can claim a societal freedom is directly related to the degree to which you can avoid societal consequences if desired. If I can say most things without negative consequence, but not some things, then I mostly have freedom of speech. If I can say anything I like without negative consequence, then I have absolute freedom of speech. If I can only say certain things in certain situations, I have little freedom of speech. If I have the ability to easily make anonymous speech then I have more freedom of speech than if the tools of anonymous speech are prohibited.
I mean isn't that obvious? What useful definition of "societal freedom of speech" do you have that contradicts that?