Slashdot Mirror


Dumb Things With Bioinformatics

PrvtBurrito writes: "About 3% of the human genome is "coded" as genes. The proteins those genes encode can be represented as long sequences of amino acids, a twenty letter alphabet. In an attempt to perhaps prove that nothing is sacred, someone has cataloged all of the english words found in known annotated protein sequences from many organisms. It looks like after cataloging over 37,000,000 characters, the longest word is chapstick and the most common word is kilter."

4 of 30 comments (clear)

  1. Amino Acids by oregon · · Score: 4, Informative

    The 20 letters are

    a Alanine
    r Arginine
    n Asparagine
    d Aspartic acid
    c Cysteine
    q Glutamine
    e Glutamic acid
    g Glycine
    h Histidine
    i Isoleucine
    l Leucine
    k Lysine
    m Methionine
    f Phenylalanine
    p Proline
    s Serine
    t Threonine
    w Tryptophan
    y Tyrosine
    v Valine

    --

    ---
    Oregon
  2. Let's copyright them. by clarkie.mg · · Score: 4, Funny

    Like the aussies who copyrighted ringing tones, someone should copyright those sequences.

    Patent on DNA material is already there, so let's go one step further with proteins.

    --
    Men are born ignorant, not stupid; they are made stupid by education. Bertrand Russel
  3. Right there by heikkile · · Score: 5, Funny

    near the beginning of chromosome 1, in plain view for anyone to read: Frst Post

    --

    In Murphy We Turst

  4. Do it yourself by meiocyte · · Score: 4, Informative

    Here's a link to check whatever protein sequence you want against the human genome. Make sure to select "blastp" (for protein sequences) in the pulldown menu. Use the alphabet provided above.. it will find near matches too. Enjoy..

    --
    The thing in the box has no place in the language-game at all; not even as a something; for the box might even be empty.