Longest Chemical Name: 64,060 letters

← Back to Stories (view on slashdot.org)

Longest Chemical Name: 64,060 letters

Posted by timothy on Tuesday March 29, 2005 @02:59PM from the 64k-should-be-enough-for-anyone dept.

mycro writes "A new article on Wikipedia shows the longest chemical name, reaching 64,060 letters. Methionylalanylthreonyl...leucine is a chemical name for enaptin, a nuclear envelope protein found in human myocytes and synapses, which is made up of 8,797 amino acids. It is involved in the maintenance of nuclear organization and structural integrity, tethering the cell nucleus to the cytoskeleton by interacting with the nuclear envelope and with F-actin in the cytoplasm."

5 of 133 comments (clear)

Min score:

Reason:

Sort:

I blame IUPAC nomenclature by Dancin_Santa · 2005-03-29 15:10 · Score: 4, Informative

The problem with this kind of naming scheme is that no valuable information can be quickly gleaned from the name itself. Neither the function nor form of the amino acid can be determined or inferred easily without resorting to computer-aided decryption of the name itself.

Something easier to remember (not an acronym of this long-ass acronym) that clearly explained the form and function of the amino acid would be much more useful.

In programmer terms, this IUPAC nomenclature is like Hungarian notation, putting too much information about the data into the name without sufficiently ascribing useful information to it.
1. Re:I blame IUPAC nomenclature by Dr.+GeneMachine · 2005-03-29 15:19 · Score: 5, Informative
  
  This is no IUPAC problem - this long name is simply the sequence. If you have a functional protein, you have other nomenclatures at hand, for example the IEC classification for enzymes. Biochemists have developed several systems of nomenclature, which are actually useful (Overview here. IUPAC has its place for small molecules organic chemists are concerned with.
  By the way, if you want a longer and equally useless chemical name, you can always spell out the nucleotid sequence of a whole chromosome in full nomenclature.
  
  --
  This comment does not exist.
2. Re:I blame IUPAC nomenclature by BillyBlaze · 2005-03-29 16:51 · Score: 2, Informative
  
  I don't think anybody suggested that would be done for all elements, or permanently for any. It's just so there's a way to talk about the newly-discovered ones until people stop fighting over whom it should be named after.
Hmm... by schmink182 · 2005-03-29 15:21 · Score: 3, Informative

Occurences:
a - 5940
b - 0
c - 1946
d - 238
e - 3210
f - 0
g - 2738
h - 1192
i - 2666
j - 0
k - 0
l - 14645
m - 1938
n - 3195
o - 1457
p - 1398
q - 0
r - 2771
s - 3069
t - 3575
u - 3273
v - 430
w - 0
x - 0
y - 10379
z - 0

Nope...it's probably not random.
1. Re:Hmm... by dillon_rinker · 2005-03-30 10:04 · Score: 2, Informative
  
  You are correct, but "a lot" is not "most." Most of the compression comes from the redundancy. To be precise:
  
  There are 26 characters, as you point out. On the other hand, as the grandparent poster points out, only 18 are used.
  
  log2(18)=4.17 (to the largest hundredth)
  
  So you only need 4.17 bits to represent 26 characters.
  
  8 bits/4.7 bits = 1.9
  
  So using a more compact bit-level representation of characters, you could achieve a compression ratio of 1:1.9. This would reduce the file from 68K to 36K.
  
  The comment indicated final result was 12K. Reducing from 436K to 12K is a compression ratio of 1:3. The total compression ratio is 1:5.6
  
  Guess I don't have enough to do today...