Human Genome Sequencing Completed
Arthur Dent '99 writes "According to this article at Reuters, the last chromosome in the human genome has finally been sequenced, taking 150 British and American scientists 10 years to complete. The sequenced chromosome, Chromosome 1, is the largest chromosome, with nearly twice as many genes as the average chromosome, making up eight percent of the human genetic code. The Human Genome Project has published the sequence online in the journal Nature, according to the article. It contains 3,141 genes (over 1,000 of them newly discovered), and 4,500 new SNPs -- single nucleotide polymorphisms -- which are the variations in human DNA that make people unique."
They are all different sizes. Chromosomes are numbered from largest to smallest 1 - 22 (except 21 and 22; 21 is actually the shortest and 22 is slightly bigger; the mistake was made in early cytogenetics because they couldn't distinguish the sizes well enough and those two were named incorrectly) + X and or Y. So chr 1, being very large, has a very large number of genes just because it's huge. It isn't the most gene dense, however, which is chromosome 19 with more genes / Mb than elsewhere in the genome.
> Why do one chromosone have more genes than others?
Same reason some source code files contain more lines of code than others. They do different things.
Slashdot monitor for your Mozilla sidebar or Active Desktop.
From the fine article:
"The scientists also identified 4,500 new SNPs -- single nucleotide polymorphisms -- which are the variations in human DNA that make people unique."
There are other variations which make us unique.
Alternate alleles*
Indels (insertions/deletions)
Variable numbers of repeats.*
The genetic code uses 4 letters, but I'll use English for explaination.
A SNP is a single letter which has different values in different individuals: "The cat and the dog" vs "the hat and the dog".
An indel is where letters have been inserted into one sequence or deleted from another (without additional data, we can't distinguish these possibilities.)
"The cat and the dog" vs "the cat and the big dog".
In alternate alleles there are a bunch of changes which always stick together, e.g. we observe "the cat and the big dog" and "the cat and the small mouse", but never (or exceedingly rarely) "the cat and the big mouse" or "the cat and the small dog."
Variable repeats are a special case of indels, but common enough to warrant a category of their own. "The cat and and and the dog" vs "the cat and and and and and the dog".
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
Just to add on to this
20/20 vision means that when you stand away from something at 20ft, what you see is what the normal person would see at 20ft.
20/40 is, well, if you stand 20ft away, you see what a normal person would see at 40ft
Same goes for 20/10.
You seem to be under the impression that the number 1000 has some special meaning. Let's try your comment again, in octal:
pi * 1750 genes. Got to love those fun coincidences
Not so exciting now, is it? Nature is not decimal-based. The only reason we tend to be is because of the number of fingers we have.
The basic idea is this. Our cells need a program that tells them what to do. That's the genome. There are a total of 46 chromosomes consisting of two sets of 23 independent chromosomes (1 - 22 and X or Y). DNA makes up the chromsomes. It's just a chemical structure that stores information; the four chemicals that make up DNA are Adenine (A), Thymidine (T), Cytosine (C) and Guanine (G). Every DNA molecule is actually two pieces of DNA that pair together as A binding to T and C binding to G. Sequencing is a chemical reaction that will tell you what the sequences of these four nitrogenous bases are. For example you may end up getting a read of AGTATTACGTATGCATAGGTCCGATG from a sequencing reaction (usu you'll get about 500 - 700 bases in one reaction). This tells you the sequence of ONE of the TWO strands of the DNA molecule. BUT since they pair in a predictable way, you know the sequence of the opposite strand (A-T and C-G). Our genomes are composed of approximately 3.2 billion total As, Cs, Ts and Gs. The goal of the genome project was just to tell us what the sequence of those bases are. That's it. Finding genes and things of that nature are really things that come about from having the primary sequence to reference. If you want to find a mutation you have to know what the sequence is SUPPOSED to be and WHERE IT IS before you can say it is different. That's your quick answer: the genome project sought to determine (1) what the sequence of bases in human chromosomes where and (2) the physical position of these sequences within the chromosomes. They did some other interesting things to prepare for it along the way, but that is a separate matter.
Completing the sequence and actually putting it together are two entirely different affairs. Small sequences called ESTs (Expressed Sequence Tags) were obtained during this effort. The big task after that was to put everything together AND in order. Think of it as a massive puzzle. Even the genome has different "builds" depending on the level of completeness of this work.
A CC-licensed illustrated horror novel
This is by my count the fourth time that the human genome has been announced "finished" - anymore times and they will all be invited to become slashdot editors.
Automated DNA sequencing software