The Emerging Science of DNA Cryptography
KentuckyFC writes "Since the mid 90s, researchers have been using DNA to carry out massively parallel calculations which threaten encryption schemes such as DES. Now one researcher says that if DNA can be used to attack encryption schemes, it can also protect data too. His idea is to exploit the way information is processed inside a cell to encrypt it. The information that DNA holds is processed in two stages in a cell. In the first stage, called transcription, a DNA segment that constitutes a gene is converted into messenger RNA (mRNA) which floats out of the nucleus and into the body of the cell. Crucially, this happens only after the noncoding parts of the gene have been removed and the remaining sequences spliced back together." (More below.)
KentuckyFC continues: "In the second stage, called translation, molecular computers called ribosomes read the information that mRNA carries and use it to assemble amino acids into proteins. The key point is that this is a one way process. Information can be transferred from the DNA to the protein but not back again because during the process various details are lost, such as the places where the noncoding sequences have been removed. The new idea behind DNA cryptography is to exploit this to encrypt a message. The message is encoded in the sequence of bases in the DNA (A for 00, C for 01, G for 10, T for 11, for example) and then processed. The resulting protein is then made public. The key, which is kept private, is the information necessary to reassemble the DNA from the protein, such as the position of the noncoding regions (abstract)."
Couple notes for people who haven't read the paper:
1. Their scheme is not in-vivo (they're not actually working with DNA and proteins). It's a computational process that is based on the information transformations that occur inside a cell.
2. It's kind of cute and nifty, but not particularly applicable. They discuss weaknesses in the attack, but in a pretty handwavey way. The core problem is that their "encrypted text" will include their entire plain text, just split up into pieces. Secondly, it doesn't seem to offer anything particularly new when compared to traditional block ciphers.
3. Mathematically, this has nothing to do with biology. It's just loosely based on biological processes, and it's not really clear that these biological processes have anything particular to contribute to development of encryption. Transcription is just a mapping (from genomic DNA to mRNA), and translation is just a lossy mapping (from 3-tuples of mRNA to peptides). Mathematicians and cryptographers have been aware of generalized versions of these functions these for a long time (homomorphisms and reverse homomorphisms). There's not much new being introduced here.
-Laxitive
You would not need to solve the protein folding problem in order to crack this form of cryptography. It is not as though data is encoded in protein conformation using this technique. In fact, this technique would be unlikely to generate well-formed proteins at all. According to the paper, the method does not actually use real nucleic acids or proteins, or even very accurately simulate their properties in biological transcription or translation. The paper is even titled "A Pseudo DNA Cryptography Method." The author is using transcription and translation as a model for the general data flow present in this scheme, but the author points out that strictly hewing to the biological splicing scheme would introduce extra vulnerabilities, since it would be possible to identify from the final protein sequence places where splicing occured.
On the subject of vulnerabilities, this method, as admitted in the paper, is a symmetric substitution cipher. You still need a secure channel to perform key exchange (the key here contains the locations and lengths of spliced out introns). If an eavesdropper gets ahold of the protein (ciphertext), a simple lookup of codons gets the eavesdropper back to the post-spliced RNA. The unique challenge of this cipher is to determine where splicing occured in order to get back to the pre-spliced RNA (which is a simple complement of the DNA sequence, which in turn is an easy substitution cipher away from the plaintext). While a clever way to implement it, the intron splicing in this method is really no different than the mechanisms used to confuse plaintexts in block ciphers like DES, and it is subject to the same vulnerabilities like differential attack.
"FDA staff reviewers expressed concern about the number of patients who were left out of the study because they died."