Slashdot Mirror


DNA-Based Steganography Wins Intel Education Award

to'c wrote: "17-year-old Viviana Risca wins US$100,000 from Intel for her work in 'DNA-based Steganography.' Talk about combining hot technologies! With a bit of gene-splicing, that next pigeon you clone wouldn't need to carry a message. It would be the message! Full story here." Interesting test message she chose, too.

8 of 246 comments (clear)

  1. give those kids a break!! by Anonymous Coward · · Score: 4

    Honestly,

    here we have (some of) the most outstanding and promising kids in highschools in the US (or didn't I get the meaning of this award). They will probably even be among the best in their year at Harvard/MIT or whatever. And they did some excellent and truly impressive work.
    They deserve credit and appreciation instead of bitching about this or that detail of their work or whining "If I would have had these toys to play I would have done what he/she did". Pure envy... If you're be capable of doing cool stuff, nobody at your local university will leave you standing outside.
    Seeing stuff like this make me profoundly happy and say about Intel what you want, but this is a service to society (and their PR-dept. :) ).
    The winners and most probably a lot of non-winners have shown how cool doing research (or hacking in the true sense of the work, which is essentially the same) is and they should get all that support to pursue whatever they're capable of doing.

    Nevertheless: critical and rational analysis of their work a much appreciated way to show respect (in my experience).

    So sit back, relax and
    #define BITCHMODE 0
    for once.

    Roland

  2. Re:Cool Lab Work - but Bad Crypto! by billstewart · · Score: 4
    The "only a few billion" comment is good, and some of the other poster's comments about it being much easier to find the sequence you're looking for by searching for the start token rather than having to sequence the whole mess are bang on as well.


    The solution for securing steganography is straightforward - it's to say "it's not crypto, it
    's just stego, but that can still be pretty effective" rather than saying "there's a trillion trillion possible sequences in this billion starting points, so nobody'd ever find it". So rather than hiding a plaintext message, which somebody might find, you encrypt your message with a real crypto algorithm, producing something that looks like random noise, and then if the underlying substrate you're hiding it in (whether its pictures, sounds, or DNA) looks enough like random bits, you're done; otherwise you make a model of the substrate and transform your cyphertext into that space. (Peter Wayner's paper on Mimic Functions has a really good discussion of this.) For an application like this, just getting the right ratio of nucleotides may be enough, or one or two levels of Markov chain beyond it. (Plus make sure the DNA isn't from a really popular mouse clone or whatever that somebody might have already sequenced :-)
    Then it does become much harder to find the cyphertext, which makes cracking it much much harder.


    Dirtside said:
    Because the pair of primers provides a trillion trillion options, she concludes that the code is essentially unbreakable.
    Only in the same way that public-key encryption is unbreakable, in that you can't brute-force it in any reasonable amount of time.

    and randombit said something similar.

    No, it's much different than that. Public-key encryption is exponentially hard, while this is just linear in the length of the chains. Computer-Crunching through a billion starting points looking for English-like sequences is a few minutes' work, though the chemical work in sequencing the whole mess is much slower. By contrast, it's easy to make a factoring job taking longer than the current age of the universe, just by making the keys a few hundred bits longer.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  3. Hidden in Plain View; Public Key Biosteganography by Effugas · · Score: 5

    Surprisingly common form of steganography, really. There is *absolutely* no obfuscation actually hidden in the data itself--it's literally plaintext encoded as simple entries in the DNA sequence. The security comes from the fact that its surrounded by a significant amount of difficult to search(without knowledge of the correct primers) of non-secret information.

    Essentially, you're talking about a symmetric "location" secret protecting unencrypted content within a significant amount of data.

    Such techniques are actually used quite commonly as countermeasures against legally mandated discovery procedings--a large corporation(Microsoft or Tobacco companies in particular) is sued for its memo records; tens of thousands of boxes of unrelated material are delivered to the suing party on the presumption that they will hide the one "smoking gun" memo that will seriously damage the corporation.

    In the inevitable arms race that follows, the entire mass of data gets OCR'd and searched for critical keywords. That solves the legal issues, but without an efficient "OCR" method that can quickly sequence a chromosome into its underlying data, this student's steganographic method is extraordinarily effective.

    However, should such a technology be created, the size of the "keyspace" becomes drastically shortened: Apparently, the entire human genome will fit into six hundred megabytes--this is quite a bit of data, but it's not "trillions and trillions" of possibilities. A simple statistical analysis tool will reveal *any* non-natural data, as nCipher revealed when they showed that a cryptographic private key will stick out even within 2GB of fluff data--it's *TOO* random.

    What'd really blow me away is if Viviana was able to follow up this fascinating research with an implementation of Public Key Steganography. There was a paper referenced on Counterpane that talked about this; essentially it hides data in such a manner that the ensteganographer(and thus, anyone other than the recipient of the hidden message) cannot determine the exact location of their own message. The way I'd imagine it working, you'd mutate a virus such that it delivered a given message to a location dependant upon not the data being delivered but some publically available key. That key would essentially be a one way hash of bioreceptors that the virus should attach itself to, and you'd essentially have a restriction that the virus would not infect any cell that did not possess those specific bioreceptors. An attacker would need to sequence not only the global DNA sequence for changes but each possible type of cell that could have been modified to contain the secret, whereas the message reader would know exactly what types of cells to search--viola, your asymmetric primitive. Maybe you'd only find a link to the appropriate primer, or possibly your entire message, but you'd have your public key steganography implemented with biological methods.

    Funky.

    Yours Truly,

    Dan Kaminsky
    DoxPara Research
    http://www.doxpara.com

  4. Re:What about her education by mreece · · Score: 5

    Just to expand on the previous comment...

    I did not do my work for the Intel STS (I was the 6th place winner) in a laboratory, but I have worked in a university lab in the past. There are many high school students who do research in well-equipped labs. It isn't an unfair advantage - these opportunities are available to most people who have enough initiative and intelligence to pursue them.

    This year (if I remember correctly) there were 4 Intel finalists who participated in RSI (if I remember right, it was Feng Zhang, Viviana, Sasha Schwartz, and Elizabeth Williams). A few other finalists were in other summer research programs.

    Also, I recall Viviana saying that she ordered the DNA from a lab somewhere that will manufacture DNA with a given base-pair sequence... Apparently it isn't too expensive.

    I hope this clarifies some things...

    --
    Matt Reece
  5. Intel STI by mreece · · Score: 5

    I was one of the participants in this competition - I finished in 6th place with a project on adaptive wavelet methods for fluid dynamics problems. (I'm Matt Reece from Louisville, Kentucky).

    First of all, I would like to say that if anyone reading this is a high school student considering entering this competition, do it. It is very much worth the time you spend on your research if you can become a finalist. All 40 finalists get $5000, a laptop (650 MHz Pentium III), and a trip to D.C. where Intel pays for everything - very nice expensive dinners, meetings with Nobel laureates... it's an incredible program. The best part was definitely meeting the other finalists, though. They were all wonderful people and I have had a great week... Don't think these people are just science nerds (not that that's a bad thing, mind you). They're very well-rounded. Many speak foreign languages, play musical instruments, sports, etc.

    Also, the week is not all science - Intel provided a web center in the hotel with lots of nice computers equipped with Quake 3, so we could have big multiplayer deathmatches over the LAN. I also played cards more in the past week than I have in months, and generally just spent a lot of time hanging out with the other finalists.

    Anyway, to get on to some of the comments the rest of you have made about Viviana's project. First, I will say that I'm not as familiar with her work as I am with some of the other projects.

    She does attend a U.S. school - I think it's a public one but I'll have to look that up later. Personally, I attend a public magnet school (duPont Manual High School) and I know many of the other finalists do attend public schools.

    It would probably be best if Viviana responded to your comments about DNA steganography, as I'm not an expert in the area. Still, the project did seem to be very well done and she did an excellent job of presenting it to the public.

    As far as your comment about open source programmers... If an open source project involved a new algorithm or some other method that could be applied to science, then it would certainly stand a chance in the Intel competition. My wavelet code is open source, although at this point I haven't implemented enough features to make it very useful.

    Also, you might be interested to know that the judging is not solely based on the research. The first stages are based on a research paper - out of about 1500 applicants, 300 semifinalists were chosen and then from those 300, forty were chosen as finalists.

    The finalist judging is based on three 15-minute interviews in which judges ask questions related to science in general. Some questions are straightforward tests of scientific knowledge, others are more open-ended questions meant to see how well you can think. Some of the questions are things that no one knows...

    These judging interviews took place on Thursday and Friday (the 9th and 10th). The next two days, March 11th and 12th, involved the public presentations, where we set up display boards at the National Academy of Science and talked about our research with judges, scientists, and anyone else who showed up. The judges talked to students on Saturday, and from what I understand had made all their decisions just before the dinner at Mr. K's (great Chinese restaurant) Saturday night. The winners were announced Monday evening.

    So anyway, judging is based initially on the research, but the final awards are also based on general scientific knowledge and also ability to communicate that knowledge to others. The emphasis on communication is also evident in the Seaborg award, given to the student who best displays an excitement about science and a willingness to share that excitement - that award went to Eugene Simuni, who finished 5th. His work was all the more amazing because he's only lived in the U.S. for two years (he came here from Russia) and yet he's better at communicating science to the general public (in English, a language that he more or less taught himself) than most or maybe all of the rest of us who have been speaking English our whole lives.

    Well, there is much more I could say, but I just wanted to give you a better idea of what this competition is all about. It's a great program, and I would recommend it to anyone. If you have any questions about the Intel STS, feel free to ask me.

    --
    Matt Reece
  6. Cool Lab Work - but Bad Crypto! by billstewart · · Score: 5
    I don't know how much of this is the reporting, either by the judges or the press, vs. how much is the winner's understanding of the technology involved (it sounds like it's her mistake, and the judges didn't understand it.) The idea of stashing messages in DNA is cool, and doing the actual work to build it is definitely cool stuff for a high-school student. But the crypto isn't correct.


    Steganography is the art of hiding messages in things, where they aren't likely to be noticed, either because nobody'd think to look there, or because there's too much other junk for your message to stand out, or because you've done the work to make your message look similar to the background noise. The classic example is hiding a message in the low-order bits of a digitized photo image or a sound file, where they don't affect the output much, though they're usually visible if anybody looks.

    Stashing a secret message in a bunch of a DNA has a good chance of "they wouldn't look there", but if they *did* decide to look in the bunch of DNA, a message like "JUNE6_INVASION: NORMANDY" probably has different enough statistics from the rest of the DNA around it that it might stand out. Sure, it's much more obvious to the intended recipient, who's looking for the specific start and end "primer" sequences, and it's also much more obvious to someone who knows the alphabet of nucleotides she's using to represent letters (as opposed to having to guess from entropy, where there'd be too many false positives.) But the conclusion "Because the pair of primers provides a trillion trillion options, she concludes that the code is essentially unbreakable" is insupportable - If you encode your message in a way that has similar statistics to the background signals/noise, you can hide it pretty well, but she's implying that straight plaintext is also unfindable there, and it's not, any more than hiding it in the low order bits of a picture is.


    Nice work anyway, and it lets people make lots of entertaining comments about "Computer Viruses" :-)

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  7. Pretty cool, but check out the second place winner by Darby · · Score: 5

    He came up with an extension to a mathematical theorem by Ramanujan. This is pretty impressive since most people have no idea what that guy was talking about ;-)

    If you don't know, Ramanujan was an Indian mathematician who was born an "untouchable".
    He was poor, sickly and almost totally uneducated.
    He recreated a large portion of modern mathematics independently. He wrote to a British mathematician whose name escapes me with a lot of his work included. At first glance it looked like all previously proven theorems and so he disregarded it and threw it away. He started thinking about it and realized there were many novel approaches and new ideas so he brought him over to England and set him up at the university. He died a few years later due basically to poor health attributable to a really shitty life, but his work blew open doors into mathematical realms we are still trying to probe.

    So, in long ;-) , this kid might be a candidate for a Fields medal in the future.
    ---CONFLICT!!---

  8. MODERATORS! LEARN SOME BIOLOGY BEFORE MODERATING by yuriwho · · Score: 5

    JunkDNA's post nails the issue. There are too many high-rated posts criticising the cryptography used here. Cryptography has nothing to do with it. The message is easy to read but it is hidden in a large volume of DNA sequence. The human genome project (a worldwide effort) has been working for years to sequence the entire genome...still unfinished. She proposes to bury the message in the genome of an organism. To try and use your sophisticated cryptography breaking algorithyms to "break the code" you first would have to sequence all the DNA present in your suspect message DNA. Given that coded DNA could be stored anywhere on a spy (in a stain on a dress for example) you would have to be able to sequence the human genome thousands of times over (once for every stain/suspect location) to have the data to apply encryption cracking algorithyms to. With the wonderful invention of PCR (polymerase chain reaction), the code (two primers of defined sequence, ~ 18 base pairs in length) and the location of the stain are all that needed to read the message. This idea is brilliant. Its not based on crypto but on the unreadability of the data. Yet provides a method for the intended receiver to find the message with very little info. The beauty is that the decoding message is very small, simple and easily crypto'd into a conversation.

    This idea is so simple and elegant that I'm sure the intelligence agencies around the world will use it now, if they are not already

    --
    no sig.