DNA-Based Steganography Wins Intel Education Award
to'c wrote: "17-year-old Viviana Risca wins US$100,000 from Intel for her work in 'DNA-based Steganography.' Talk about combining hot technologies! With a bit of gene-splicing, that next pigeon you clone wouldn't need to carry a message. It would be the message! Full story here." Interesting test message she chose, too.
Definite proof that the medium is the message!!!!
--
Just to correct a few details - Ramanujan was not an "untouchable". He was a bramhin - that is from the highest cast. He was also in reasonable good health while in India. But he was poor and he was uneducated. He became ill after going to Cambridge due to his strict vegeterian diet and the cold weather.
The British mathematician was G.H. Hardy. For more about Ramanujan and a non-technical description of his work on partitions check out Robert Kanigel's book The Man Who Knew Infinity. A more technical introduction is Hardy's Twelve Lectures on Ramanujan or The Collected Works of Ramanujan.
My hat off to you and the other winners, just reading the summarised list of achievements with your project information on that article indicates not only that you are very gifted, but that you have the determination to utilise those gifts, a rarer thing than it might appear. I look forward to seeing what you all do in the future, and I hope it still manages to balance well enough that you all have fun too :)
You can't win a fight.
To respond to both this message and the sibling message. It doesn't matter if the statistics are normal. In a human, there are a few billion base-pairs in DNA. If the secret is encoded at some unknown position, it might be hard to extract without the primers, but there are ONLY a few billion positions it could be in.. So this looks like cryptography with a 32-bit key.
This is much like the 'secret' cypher where you encode each word of some plaintext message as a list of page, line, and word numbers in some arbitrary book. 12-3-5 (page 12, line 3, word 5). The book itself acts like the key. Unfortunately, this isn't secure as there aren't so many books out there. I can just try each one till I find one that gives a reasonable message, say a 20-bit key.
On the other hand, this is a news report, the story might have just 'skipped over' this issue and Viviana thought over it and has a solution. Or maybe not, don't forget that good steganography is damned hard. I ask you, how would you try to 'hide' some secret message so that somebody couldn't even detect it?
Only in the same way that public-key encryption is unbreakable, in that you can't brute-force it in any reasonable amount of time. However this doesn't rule out any weaknesses in the method itself, such as being able to statistically detect the desired data segment, etc.
Also note that steganography in general relies on obscurity; in other words, "Secrets are best kept when no one knows that secrets are being kept." (Nigel Calder, Einstein's Universe) If everyone knows that there's DNA in that there pigeon, it makes it a lot easier to find than if they don't even know that you're transmitting DNA via rabies-infected fowl.
Another favorite steganographic method of mine is to encode data into graphic images, for example, taking a bitmapped image and using a key to encode data onto each pixel, say by incrementing the red RGB value of each pixel by 1 where appropriate. It would be exceedingly difficult to detect that a message even contained data, let alone extracting it without the key.
"Destroy science and religion. Science would re-emerge exactly the same; but not religion." - Penn Jillette, paraphrased
"Also, the week is not all science - Intel provided a web center in the hotel with lots of nice computers equipped with Quake 3, so we could have big multiplayer deathmatches over the LAN."
See? The government organizations were right! Playing Quake and other violent video games does make you become violent, neurotic, and make you want to blow up your...
Oh wait, these kids won what award? How prestegious was it? Intel says they'll be the nations leaders and innovators?
------------
"Okay, who taught the cat how to type ctrl alt delete?"
I don't know about that. Paul D. Schreiber High School is part of the Port Washington Union Free School District, according to the 1999 profile. I can't tell for sure, but that kind of sounds like a public school to me...
I'm a bit confused here. The article seems to suggest that the DNA encoding was actually executed, instead of merely being theoretically described/proposed. Um, the school I was in when I was 17 most definitely did not have DNA-handling equipment. Does this mean that (a) the price was awarded to somebody who already had access to nonstandard equipment (giving the price a bit of an elitarian ring), or (b) DNA juggling is already common place enough that highschools carry the stuff as basic equipment ? Both options seem a bit of food for thought to me...
Honestly,
:) ).
here we have (some of) the most outstanding and promising kids in highschools in the US (or didn't I get the meaning of this award). They will probably even be among the best in their year at Harvard/MIT or whatever. And they did some excellent and truly impressive work.
They deserve credit and appreciation instead of bitching about this or that detail of their work or whining "If I would have had these toys to play I would have done what he/she did". Pure envy... If you're be capable of doing cool stuff, nobody at your local university will leave you standing outside.
Seeing stuff like this make me profoundly happy and say about Intel what you want, but this is a service to society (and their PR-dept.
The winners and most probably a lot of non-winners have shown how cool doing research (or hacking in the true sense of the work, which is essentially the same) is and they should get all that support to pursue whatever they're capable of doing.
Nevertheless: critical and rational analysis of their work a much appreciated way to show respect (in my experience).
So sit back, relax and
#define BITCHMODE 0
for once.
Roland
The solution for securing steganography is straightforward - it's to say "it's not crypto, it
's just stego, but that can still be pretty effective" rather than saying "there's a trillion trillion possible sequences in this billion starting points, so nobody'd ever find it". So rather than hiding a plaintext message, which somebody might find, you encrypt your message with a real crypto algorithm, producing something that looks like random noise, and then if the underlying substrate you're hiding it in (whether its pictures, sounds, or DNA) looks enough like random bits, you're done; otherwise you make a model of the substrate and transform your cyphertext into that space. (Peter Wayner's paper on Mimic Functions has a really good discussion of this.) For an application like this, just getting the right ratio of nucleotides may be enough, or one or two levels of Markov chain beyond it. (Plus make sure the DNA isn't from a really popular mouse clone or whatever that somebody might have already sequenced
Then it does become much harder to find the cyphertext, which makes cracking it much much harder.
Dirtside said:
Because the pair of primers provides a trillion trillion options, she concludes that the code is essentially unbreakable.
Only in the same way that public-key encryption is unbreakable, in that you can't brute-force it in any reasonable amount of time.
and randombit said something similar.
No, it's much different than that. Public-key encryption is exponentially hard, while this is just linear in the length of the chains. Computer-Crunching through a billion starting points looking for English-like sequences is a few minutes' work, though the chemical work in sequencing the whole mess is much slower. By contrast, it's easy to make a factoring job taking longer than the current age of the universe, just by making the keys a few hundred bits longer.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Surprisingly common form of steganography, really. There is *absolutely* no obfuscation actually hidden in the data itself--it's literally plaintext encoded as simple entries in the DNA sequence. The security comes from the fact that its surrounded by a significant amount of difficult to search(without knowledge of the correct primers) of non-secret information.
Essentially, you're talking about a symmetric "location" secret protecting unencrypted content within a significant amount of data.
Such techniques are actually used quite commonly as countermeasures against legally mandated discovery procedings--a large corporation(Microsoft or Tobacco companies in particular) is sued for its memo records; tens of thousands of boxes of unrelated material are delivered to the suing party on the presumption that they will hide the one "smoking gun" memo that will seriously damage the corporation.
In the inevitable arms race that follows, the entire mass of data gets OCR'd and searched for critical keywords. That solves the legal issues, but without an efficient "OCR" method that can quickly sequence a chromosome into its underlying data, this student's steganographic method is extraordinarily effective.
However, should such a technology be created, the size of the "keyspace" becomes drastically shortened: Apparently, the entire human genome will fit into six hundred megabytes--this is quite a bit of data, but it's not "trillions and trillions" of possibilities. A simple statistical analysis tool will reveal *any* non-natural data, as nCipher revealed when they showed that a cryptographic private key will stick out even within 2GB of fluff data--it's *TOO* random.
What'd really blow me away is if Viviana was able to follow up this fascinating research with an implementation of Public Key Steganography. There was a paper referenced on Counterpane that talked about this; essentially it hides data in such a manner that the ensteganographer(and thus, anyone other than the recipient of the hidden message) cannot determine the exact location of their own message. The way I'd imagine it working, you'd mutate a virus such that it delivered a given message to a location dependant upon not the data being delivered but some publically available key. That key would essentially be a one way hash of bioreceptors that the virus should attach itself to, and you'd essentially have a restriction that the virus would not infect any cell that did not possess those specific bioreceptors. An attacker would need to sequence not only the global DNA sequence for changes but each possible type of cell that could have been modified to contain the secret, whereas the message reader would know exactly what types of cells to search--viola, your asymmetric primitive. Maybe you'd only find a link to the appropriate primer, or possibly your entire message, but you'd have your public key steganography implemented with biological methods.
Funky.
Yours Truly,
Dan Kaminsky
DoxPara Research
http://www.doxpara.com
Just to expand on the previous comment...
I did not do my work for the Intel STS (I was the 6th place winner) in a laboratory, but I have worked in a university lab in the past. There are many high school students who do research in well-equipped labs. It isn't an unfair advantage - these opportunities are available to most people who have enough initiative and intelligence to pursue them.
This year (if I remember correctly) there were 4 Intel finalists who participated in RSI (if I remember right, it was Feng Zhang, Viviana, Sasha Schwartz, and Elizabeth Williams). A few other finalists were in other summer research programs.
Also, I recall Viviana saying that she ordered the DNA from a lab somewhere that will manufacture DNA with a given base-pair sequence... Apparently it isn't too expensive.
I hope this clarifies some things...
Matt Reece
I was one of the participants in this competition - I finished in 6th place with a project on adaptive wavelet methods for fluid dynamics problems. (I'm Matt Reece from Louisville, Kentucky).
First of all, I would like to say that if anyone reading this is a high school student considering entering this competition, do it. It is very much worth the time you spend on your research if you can become a finalist. All 40 finalists get $5000, a laptop (650 MHz Pentium III), and a trip to D.C. where Intel pays for everything - very nice expensive dinners, meetings with Nobel laureates... it's an incredible program. The best part was definitely meeting the other finalists, though. They were all wonderful people and I have had a great week... Don't think these people are just science nerds (not that that's a bad thing, mind you). They're very well-rounded. Many speak foreign languages, play musical instruments, sports, etc.
Also, the week is not all science - Intel provided a web center in the hotel with lots of nice computers equipped with Quake 3, so we could have big multiplayer deathmatches over the LAN. I also played cards more in the past week than I have in months, and generally just spent a lot of time hanging out with the other finalists.
Anyway, to get on to some of the comments the rest of you have made about Viviana's project. First, I will say that I'm not as familiar with her work as I am with some of the other projects.
She does attend a U.S. school - I think it's a public one but I'll have to look that up later. Personally, I attend a public magnet school (duPont Manual High School) and I know many of the other finalists do attend public schools.
It would probably be best if Viviana responded to your comments about DNA steganography, as I'm not an expert in the area. Still, the project did seem to be very well done and she did an excellent job of presenting it to the public.
As far as your comment about open source programmers... If an open source project involved a new algorithm or some other method that could be applied to science, then it would certainly stand a chance in the Intel competition. My wavelet code is open source, although at this point I haven't implemented enough features to make it very useful.
Also, you might be interested to know that the judging is not solely based on the research. The first stages are based on a research paper - out of about 1500 applicants, 300 semifinalists were chosen and then from those 300, forty were chosen as finalists.
The finalist judging is based on three 15-minute interviews in which judges ask questions related to science in general. Some questions are straightforward tests of scientific knowledge, others are more open-ended questions meant to see how well you can think. Some of the questions are things that no one knows...
These judging interviews took place on Thursday and Friday (the 9th and 10th). The next two days, March 11th and 12th, involved the public presentations, where we set up display boards at the National Academy of Science and talked about our research with judges, scientists, and anyone else who showed up. The judges talked to students on Saturday, and from what I understand had made all their decisions just before the dinner at Mr. K's (great Chinese restaurant) Saturday night. The winners were announced Monday evening.
So anyway, judging is based initially on the research, but the final awards are also based on general scientific knowledge and also ability to communicate that knowledge to others. The emphasis on communication is also evident in the Seaborg award, given to the student who best displays an excitement about science and a willingness to share that excitement - that award went to Eugene Simuni, who finished 5th. His work was all the more amazing because he's only lived in the U.S. for two years (he came here from Russia) and yet he's better at communicating science to the general public (in English, a language that he more or less taught himself) than most or maybe all of the rest of us who have been speaking English our whole lives.
Well, there is much more I could say, but I just wanted to give you a better idea of what this competition is all about. It's a great program, and I would recommend it to anyone. If you have any questions about the Intel STS, feel free to ask me.
Matt Reece
Steganography is the art of hiding messages in things, where they aren't likely to be noticed, either because nobody'd think to look there, or because there's too much other junk for your message to stand out, or because you've done the work to make your message look similar to the background noise. The classic example is hiding a message in the low-order bits of a digitized photo image or a sound file, where they don't affect the output much, though they're usually visible if anybody looks.
Stashing a secret message in a bunch of a DNA has a good chance of "they wouldn't look there", but if they *did* decide to look in the bunch of DNA, a message like "JUNE6_INVASION: NORMANDY" probably has different enough statistics from the rest of the DNA around it that it might stand out. Sure, it's much more obvious to the intended recipient, who's looking for the specific start and end "primer" sequences, and it's also much more obvious to someone who knows the alphabet of nucleotides she's using to represent letters (as opposed to having to guess from entropy, where there'd be too many false positives.) But the conclusion "Because the pair of primers provides a trillion trillion options, she concludes that the code is essentially unbreakable" is insupportable - If you encode your message in a way that has similar statistics to the background signals/noise, you can hide it pretty well, but she's implying that straight plaintext is also unfindable there, and it's not, any more than hiding it in the low order bits of a picture is.
Nice work anyway, and it lets people make lots of entertaining comments about "Computer Viruses"
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
He came up with an extension to a mathematical theorem by Ramanujan. This is pretty impressive since most people have no idea what that guy was talking about ;-)
;-) , this kid might be a candidate for a Fields medal in the future.
If you don't know, Ramanujan was an Indian mathematician who was born an "untouchable".
He was poor, sickly and almost totally uneducated.
He recreated a large portion of modern mathematics independently. He wrote to a British mathematician whose name escapes me with a lot of his work included. At first glance it looked like all previously proven theorems and so he disregarded it and threw it away. He started thinking about it and realized there were many novel approaches and new ideas so he brought him over to England and set him up at the university. He died a few years later due basically to poor health attributable to a really shitty life, but his work blew open doors into mathematical realms we are still trying to probe.
So, in long
---CONFLICT!!---
JunkDNA's post nails the issue. There are too many high-rated posts criticising the cryptography used here. Cryptography has nothing to do with it. The message is easy to read but it is hidden in a large volume of DNA sequence. The human genome project (a worldwide effort) has been working for years to sequence the entire genome...still unfinished. She proposes to bury the message in the genome of an organism. To try and use your sophisticated cryptography breaking algorithyms to "break the code" you first would have to sequence all the DNA present in your suspect message DNA. Given that coded DNA could be stored anywhere on a spy (in a stain on a dress for example) you would have to be able to sequence the human genome thousands of times over (once for every stain/suspect location) to have the data to apply encryption cracking algorithyms to. With the wonderful invention of PCR (polymerase chain reaction), the code (two primers of defined sequence, ~ 18 base pairs in length) and the location of the stain are all that needed to read the message. This idea is brilliant. Its not based on crypto but on the unreadability of the data. Yet provides a method for the intended receiver to find the message with very little info. The beauty is that the decoding message is very small, simple and easily crypto'd into a conversation.
This idea is so simple and elegant that I'm sure the intelligence agencies around the world will use it now, if they are not already
no sig.