Naming All Lifeforms On Earth With Hash Functions
First time accepted submitter ssasa writes "A Virginia Tech researcher is proposing a new naming system for all life on earth [based on each organism's] genetic fingerprint — basically something like a hash function of an organism. Hash functions are in common use in software development. Hopefully it will pass some time before we see a hash collision between a cat and some dinosaur."
For those that want to read the actual journal article
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0089142
The word hash is never mentioned either :)
I think I'll go hunt some af7caaf1e73a2d24924371a370b4ef9b so I can feed my 362842c5bb3847ec3fbdecb7a84a8692 and a nice quiet evening with my 34b46c8cf192431e84ea81109660367b, chatting about the difficulty of talking about a474fb23f886eeaa16223eba872e53b1 that some socially inept scientist decided to name with a hash function.
Not so sure this will take off since they have applied for a patent and wants users to pay a license fee to use it.
- "Every demand is a prison, and wisdom is only free when it asks nothing." Sir Betrand Russell
Last month, at ShmooCon a talk was given about spatial analysis of malware samples. The technique is borrowed directly from bioinformatics. This is a great example of techniques from Biology being used effectively in the IT security realm.
I hope that the researcher involved in naming organisms based on hash algorithms chooses context triggered piecewise hashes (CTPH) AKA fuzzy hashing or a similarity hash algorithm rather than an algorithm like SHA512. Google's simhash or at least the ideas of this type of algorithm would lend itself much better to the naming of organisms.
FYI: a FOSS implementation of fussy hashing is called ssdeep. The project site is here. This is an implementation that is widely used in open source malware analysis tools like Cuckoo Sandbox.
This kind of thinking has a tremendous problem with it. Presently, organisms take the name of a previously described species if and only if it is a member of the same species as a particular type specimen from which the species is described. This holotype serves as the reference specimen for each species. This system has worked extraordinarily well for more than 200 years and has promoted nomenclatural stability.
The biggest problem with attempting to identify species on the basis of their genetic "fingerprint" or bar code is that unless you have some other means to establish that the specimen from which the genetic material is in fact from the same species as the holotype, then the genetic fingerprint will simply misidentify the specimen. This is a major problem for much of the genetic data in GENBANK, for which, more often than not, there is no longer a means of associating the source of the genetic material with a specimen, whose identity can be established independently). because the original specimens are seldom vouchered or saved. Consequently, the actually identity of the species that has been sequenced, remains uncertain, even if alignments of the code are "perfect". As for the patent, the rules of Zoological Nomenclature forbid the commercialization of names used in science. These guys can make up their own naming scheme, but scientists, who must rely on having their work, at least in principle repeatable and refutable, will be unable to use it for the purposes of science.