Dumb Things With Bioinformatics
PrvtBurrito writes: "About 3% of the human genome is "coded" as genes. The proteins those genes encode can be represented as long sequences of amino acids, a twenty letter alphabet. In an attempt to perhaps prove that nothing is sacred, someone has cataloged all of the english words found in known annotated protein sequences from many organisms. It looks like after cataloging over 37,000,000 characters, the longest word is chapstick and the most common word is kilter."
The 20 letters are
a Alanine
r Arginine
n Asparagine
d Aspartic acid
c Cysteine
q Glutamine
e Glutamic acid
g Glycine
h Histidine
i Isoleucine
l Leucine
k Lysine
m Methionine
f Phenylalanine
p Proline
s Serine
t Threonine
w Tryptophan
y Tyrosine
v Valine
---
Oregon
Like the aussies who copyrighted ringing tones, someone should copyright those sequences.
Patent on DNA material is already there, so let's go one step further with proteins.
Men are born ignorant, not stupid; they are made stupid by education. Bertrand Russel
near the beginning of chromosome 1, in plain view for anyone to read: Frst Post
In Murphy We Turst
Here's a link to check whatever protein sequence you want against the human genome. Make sure to select "blastp" (for protein sequences) in the pulldown menu. Use the alphabet provided above.. it will find near matches too. Enjoy..
The thing in the box has no place in the language-game at all; not even as a something; for the box might even be empty.