The Emerging Science of DNA Cryptography
KentuckyFC writes "Since the mid 90s, researchers have been using DNA to carry out massively parallel calculations which threaten encryption schemes such as DES. Now one researcher says that if DNA can be used to attack encryption schemes, it can also protect data too. His idea is to exploit the way information is processed inside a cell to encrypt it. The information that DNA holds is processed in two stages in a cell. In the first stage, called transcription, a DNA segment that constitutes a gene is converted into messenger RNA (mRNA) which floats out of the nucleus and into the body of the cell. Crucially, this happens only after the noncoding parts of the gene have been removed and the remaining sequences spliced back together." (More below.)
KentuckyFC continues: "In the second stage, called translation, molecular computers called ribosomes read the information that mRNA carries and use it to assemble amino acids into proteins. The key point is that this is a one way process. Information can be transferred from the DNA to the protein but not back again because during the process various details are lost, such as the places where the noncoding sequences have been removed. The new idea behind DNA cryptography is to exploit this to encrypt a message. The message is encoded in the sequence of bases in the DNA (A for 00, C for 01, G for 10, T for 11, for example) and then processed. The resulting protein is then made public. The key, which is kept private, is the information necessary to reassemble the DNA from the protein, such as the position of the noncoding regions (abstract)."
Every link related to this is apparently owned by this group/person arxiv. The details are far too sparse to make much sense of, but as far as I can tell, the approach is:
I have to assume some additional manipulation of the transcribed message so you aren't just giving Eve large segments of your message for free, but even then, it seems like a hell of a lot of work to disguise yet another scheme to protect data via the magic transmission of additional secret data.
Anyone see where I misread this? Even if we assume that the "DNA" is the key and not the message, I'm still not seeing how you avoid the "magic" step.
$_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
The suggestion to "encrypt" things in proteins, suggesting that they're a one-way code, is absurd. We've been able to sequence proteins since the 1950s by Edman degradation. From which you can relatively easily back out possible DNA sequences. Enumerating the possible mRNAs leading to a given protein sequence is a trivial task for any Perl programmer with three minutes to spare. Either the people who came up with this scheme know nothing about cryptography, or nothing about biology. As for the "massively parallel" computing DNA allows, true, it does, but since you're dealing with physical systems, it quickly becomes impractical. If you have to synthesize and mix bathtub-sized quantities of DNA in order to perform even modest calculations (that you can likely do faster and more easily on a desktop computer anyway), this method becomes expensive and cumbersome long before you reach the point where you can actually crack keys that are interesting.
You would not need to solve the protein folding problem in order to crack this form of cryptography. It is not as though data is encoded in protein conformation using this technique. In fact, this technique would be unlikely to generate well-formed proteins at all. According to the paper, the method does not actually use real nucleic acids or proteins, or even very accurately simulate their properties in biological transcription or translation. The paper is even titled "A Pseudo DNA Cryptography Method." The author is using transcription and translation as a model for the general data flow present in this scheme, but the author points out that strictly hewing to the biological splicing scheme would introduce extra vulnerabilities, since it would be possible to identify from the final protein sequence places where splicing occured.
On the subject of vulnerabilities, this method, as admitted in the paper, is a symmetric substitution cipher. You still need a secure channel to perform key exchange (the key here contains the locations and lengths of spliced out introns). If an eavesdropper gets ahold of the protein (ciphertext), a simple lookup of codons gets the eavesdropper back to the post-spliced RNA. The unique challenge of this cipher is to determine where splicing occured in order to get back to the pre-spliced RNA (which is a simple complement of the DNA sequence, which in turn is an easy substitution cipher away from the plaintext). While a clever way to implement it, the intron splicing in this method is really no different than the mechanisms used to confuse plaintexts in block ciphers like DES, and it is subject to the same vulnerabilities like differential attack.
"FDA staff reviewers expressed concern about the number of patients who were left out of the study because they died."
Choice example quote:
If key size is already proportional to ciphertext size then why not simply do OTP. That already gives provable information theoretic security. Then you don't need any extra privacy provided by the "DNA Encryption". All you need to do is transmit the ciphertext. The proposed scheme is at best a steganographic technique. Calling it in encryption is down right false.
The author basically proposes the following code. Write down your message as a bit string. Translate the bit string from binary to base 4. (interpret it as DNA). Remove random chunks at random positions from it (i.e. the introns), and express the remaining DNA it as a protein. The encryption "key" are the introns and the position of the introns.
Sounds pretty much like BS to me.