Naming All Lifeforms On Earth With Hash Functions
First time accepted submitter ssasa writes "A Virginia Tech researcher is proposing a new naming system for all life on earth [based on each organism's] genetic fingerprint — basically something like a hash function of an organism. Hash functions are in common use in software development. Hopefully it will pass some time before we see a hash collision between a cat and some dinosaur."
For those that want to read the actual journal article
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0089142
The word hash is never mentioned either :)
I think I'll go hunt some af7caaf1e73a2d24924371a370b4ef9b so I can feed my 362842c5bb3847ec3fbdecb7a84a8692 and a nice quiet evening with my 34b46c8cf192431e84ea81109660367b, chatting about the difficulty of talking about a474fb23f886eeaa16223eba872e53b1 that some socially inept scientist decided to name with a hash function.
Not so sure this will take off since they have applied for a patent and wants users to pay a license fee to use it.
- "Every demand is a prison, and wisdom is only free when it asks nothing." Sir Betrand Russell
Will they use it to find identical lifeforms by comparing their hashes? Can't they just look at them and tell?
A higher form of life.
Last month, at ShmooCon a talk was given about spatial analysis of malware samples. The technique is borrowed directly from bioinformatics. This is a great example of techniques from Biology being used effectively in the IT security realm.
I hope that the researcher involved in naming organisms based on hash algorithms chooses context triggered piecewise hashes (CTPH) AKA fuzzy hashing or a similarity hash algorithm rather than an algorithm like SHA512. Google's simhash or at least the ideas of this type of algorithm would lend itself much better to the naming of organisms.
FYI: a FOSS implementation of fussy hashing is called ssdeep. The project site is here. This is an implementation that is widely used in open source malware analysis tools like Cuckoo Sandbox.
I think we're a little late to the game on this one.
Maybe we should enter the Twitter age and name them all with hash tags instead! Just my idea as a member of #homosapiens
A few years back there was an eruption of expensive "premium short message services" that offered gullible people to provide their "Manga character name", their "hero name" and other ridiculous stuff like that - based on a hash-function that picked a name from a name library according to the hash sum over some arbitrary data (like real name, phone number or whatever) the gullible customers provided in their short message.
Now this "pick name based on hash over genes"-proposal does not sound that different - and it is similarily useless. Why would one pick some completely different random name just because of a single insignificant minor mutation?
From now on, please refer to me as "ed35073e47a38fbbcc66c1c69058b9c3"
This system has more uses beyond categorizing life on earth. My favorite movie line: "77b0ba27c2fcfa0e02793671c27afb38"
differing genetic code.
Good! Finally people might understand that the Platypus (Platypus australis) is actually a species of beetle and not the mammalian species Ornithorhynchus (Ornithorhynchus anatinus).
This kind of thinking has a tremendous problem with it. Presently, organisms take the name of a previously described species if and only if it is a member of the same species as a particular type specimen from which the species is described. This holotype serves as the reference specimen for each species. This system has worked extraordinarily well for more than 200 years and has promoted nomenclatural stability.
The biggest problem with attempting to identify species on the basis of their genetic "fingerprint" or bar code is that unless you have some other means to establish that the specimen from which the genetic material is in fact from the same species as the holotype, then the genetic fingerprint will simply misidentify the specimen. This is a major problem for much of the genetic data in GENBANK, for which, more often than not, there is no longer a means of associating the source of the genetic material with a specimen, whose identity can be established independently). because the original specimens are seldom vouchered or saved. Consequently, the actually identity of the species that has been sequenced, remains uncertain, even if alignments of the code are "perfect". As for the patent, the rules of Zoological Nomenclature forbid the commercialization of names used in science. These guys can make up their own naming scheme, but scientists, who must rely on having their work, at least in principle repeatable and refutable, will be unable to use it for the purposes of science.
... my genius!!
How the fuck am I now supposed to have an "angel'o'sphere" in the middle of a the name of a beetle?
They just gave a beetle the name of Darwin and a stupid musician ... and I'm lost.
(* cry *)
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
are, if he (women can't be this stupid) thinks that a hierarchical naming scheme is anything like a hash function.
"I don't know, therefore Aliens" Wafflebox1
At the rate we are driving species extinct a much simpler system of naming the few surviving will be sufficient.
The race isn't always to the swift... but that's the way to bet!
Why can't women be this stupid?
You don't want a hash function for this, where the hash is effectively random. You need a function that derives a unique value for each input, but retains the relative distance of the original value. i.e. two values that are very similar yield an index that is similarly close. That way the 'hash' can be used to determine how closely related two species are. Randomising in the way a true hash does is of no real value.
If you think that it's even possible that a womon (it's actually a word...) be stupid, then you must be a racist, sexist homophobic religionist who voted (TWICE!!) for The Stupidest Man In The Word, Evar!
Women are the saviors of the World. Long live Gaia, long live the Matriarchy!!
"I don't know, therefore Aliens" Wafflebox1
Just be glad that we aren't going to name organisms with hashtags! #homosapiens!
I first thought the genetic sequence of an organism would be the input to a hash function, but reading further that doesn't seem to be the case.
"Using Vinatzer's genome sequence, the Ames strain used in the bioterrorist attack would, for example, be known as lvlw0x and the ancestor of this strain stored at the U.S. Army Medical Research Institute for Infectious Diseases would be known as lvlwlx."
The output name would still show ancestry using identical values, when one of the key properties of a hash function is that small changes in the input result in a completely changed output.
It'd be easier to just name them using a simple system based on structured syllable words, similar to the system in use for naming unnamed super heavy elements.
But instead of numerals, you could do it based on regions of discovery, general attributes of the creature, what it is, the family, blah blah etc, you get the idea.
Make it as complex, but as defined as possible. What use does a hash have for anything?
A structured systemic naming system for every possible creature would be considerably better, and is absolutely 100% unique, and better yet, is free of bias, ego or anything else in deciding the name since it would be a simple "follow the definition table".
That stuff can come after when the normal people care enough to want to name a creature that has 50 syllables to its scientific name.
Might as well just use shorthands for now. If the nomenclature may have to be changed someday because of collisions, you might as well use something more friendly today. Until we understand DNA well enough to reject nonviables, the potential for namespace collision is too high to expect to be able to use today's scheme forever.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
It is impossible to make a hash function that gives one hash to a set of similar genes, because there will always be too many near ambiguities.
There are however other ways of doing things that behave like hashes. I made the best method for this for fingerprints, so I should be contacted for stuff like this.
then you must be a racist, sexist homophobic religionist who voted (TWICE!!) for The Stupidest Man In The Word, Evar!
In the rest of the world, we call such a person an "American".
I'm waiting for captcha be implemented as mandotory check before sex.... That will surely limit STDs!
Ah. When you said "stuff them" I thought you were referring to taxidermy, and not the herbs and breadcrumbs kind.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
A hash doesn't provide any taxonomy. It would be better to use something like an OID so you see how the organism relates to other organisms.
In a way isn't a genetic fingerprint a hash by itself.
A hash is an form of ensuring the (genetic) code is exactly the same.
You could basically see every living being as a walking collecting of genetic hashes. Some of these hashes we share others are unique to a species or sub-species or unique to a single person.
The only difference is that we do not understand the genetic code well enough to use them in the same way as a hash code.
I will support naming organisms by hash functions when hash functions produce funny output on the far side of sanity-insanity dividing line.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
This idea was proposed by someone unused to sex, a scientist studying haplotype lineages. The world is however full of recombinants and don't forget that humans are a bunch of haplotype lineages too.
initially thought it was the process of hashtagging everything on earth with #twitter #hashtag
SO how many genetic factors are to be the input variables for this hash?
How many have they collected and verified (across the whole species variation range) ?
ANd they mantion dinosaur -- kindof a dearth of genetic material to classify thee dont you think (ditto for much more recent extinct species)
SO this is just a 'ssystem' somone cobbled together?
Has it passed any tests indicating it will actually work 99.99% of the time (before any other effort is put to collecting ALL the data)
(...)
At the rate we're killing organisms off, we'll only need two of these soon ... one for humans, and one for soylent green ... whatever that stuff is made from!!!!
Sure enough, the cow costume was hanging up next to the superhero outfit and sailors uniform. (S,Spud)