The truly optimal encoding isn't Huffman coding -- although it's inspirational. Arithmetic coding has beautiful properties that allow for partial bit codes.
Yes, exactly the problem. With this sort of distributed app, the bottleneck is usually just moving the data around -- not the computational analysis. Distributed.net is based on a centralized server feeding chunks of data to thousands of clients. A distributed search engine is the reverse, clients crawl somewhat independently around the web, analyze the data, and then send summarized information up the heirarchy. See dizz.net
Hey, I've been thinking about this very same problem for quite some time and some fellow nerds and I have been thinking about how to do it. How about we start a mailing list to further discuss this as an open source initiative?
a distributed app has the potential to be much more "fresh" than other search services
a network protocol needs to be designed carefully -- you don't want to be sending all the web haphazardly around the web every day. clients might be assigned to monitor nearby sites. there are some cool opportunities to use this system just to map the internet.
searching is a different beast from crawling. parallel searching -- like FAST and others -- requires major resources which an open source project couldn't manage.
full text vs topic searching: does a distributed system with clients fetch documents index every word or summarize? Topic searching is probably more appropriate for distributed searching, but full text is often more desirable.
interesting security issues come up, like how to keep clients from poluting the database.
etc...
-david.
Re:Nothing earth-shattering here
on
DNA Encryption
·
· Score: 1
That's just my point!
bioinformatics and cryptography
on
DNA Encryption
·
· Score: 1
I agree with your piece regarding the hype except the last bit. Cryptography has much to do with information theory and DNA biosequence analysis uses statistical techniques from machine learning to try to identify features in the DNA (e.g. exon-finding or promoter recognition).
DNA statistical models could be used in more clever ways than in the BBC article to encode messages that looked like other DNA. This is not difficult and could be done today.
Even better would be to exploit the biological machinery by creating a message in a synthetic gene that is expressed in the presence of some regulatory element, perhaps a synthetic small molecule.
Re:Nothing earth-shattering here
on
DNA Encryption
·
· Score: 1
I believe the added difficulty here is that the DNA must be sequenced. Sequencing 30 billion strands of DNA takes a very long time, and this would be required before applying computational search techniques (distributed or otherwise). But, if the marker is known before sequencing then the DNA that is interesting can be filtered out in the lab.
Sounds like a great book, but Slashdot shouldn't be promoting (via links to Amazon) books that it is apparently objectively reviewing. Or there should be a disclaimer.
The truly optimal encoding isn't Huffman coding --
although it's inspirational. Arithmetic coding
has beautiful properties that allow for partial
bit codes.
Yes, exactly the problem. With this sort of distributed app, the bottleneck is usually just moving the data around -- not the computational analysis. Distributed.net is based on a centralized server feeding chunks of data to thousands of clients. A distributed search engine is the reverse, clients crawl somewhat independently around the web, analyze the data, and then send summarized information up the heirarchy. See dizz.net
Also notable is Condor (similar to Mosix).
http://www.cs.wisc.edu/condor/
I suspect you just want lots of general purpose
compute power available to many users.
It's definitely easier to teach (the necessary, relevant) biology to a coder than to teach a biologist to code!
I just created http://www.egroups.com/group/dizz-net/ as a an email discussion list. You can subscribe by sending email to dizz-net-subscribe@egroups.com. There are a lot of interesting issues, many already mentioned here:
-david.
That's just my point!
DNA statistical models could be used in more clever ways than in the BBC article to encode messages that looked like other DNA. This is not difficult and could be done today.
Even better would be to exploit the biological machinery by creating a message in a synthetic gene that is expressed in the presence of some regulatory element, perhaps a synthetic small molecule.
I believe the added difficulty here is that the DNA must be sequenced. Sequencing 30 billion strands of DNA takes a very long time, and this would be required before applying computational search techniques (distributed or otherwise). But, if the marker is known before sequencing then the DNA that is interesting can be filtered out in the lab.
(I didn't get it at first because I kept trying to move my mouse directly to the letter of interest and back again. duh!)
This is very cool. I might buy a PDA now.
Sounds like a great book, but Slashdot shouldn't be promoting (via links to Amazon) books that it is apparently objectively reviewing. Or there should be a disclaimer.