Slashdot Mirror


Download The Human Genome

CMU_Nort writes: "The San Francisco Gate has a story about the completion of the human genome project. Apparently the University of California at Santa Cruz has put the Genome online for downloading here. I don't know about you, but I think this sort of sharing is very cool. We finally have the source for human beings. Now if only they'd GPL it."

39 of 159 comments (clear)

  1. Re:Oh-oh... GPL restrictions... by / · · Score: 2

    Well, if nothing else, it should drastically increase the amount of paperwork filled out at spermbanks....

    --
    "If one is really a superior person, the fact is likely to leak out without too much assistance" -- John Andrew Holmes
  2. Re:Whose copy is it? by bugg · · Score: 2
    Take any two people, regardless of race. Their DNA is 99% similar. For every functional (no errors in chromosmal replication, etc) human our DNA is 99% identical. And of that 1% that differs, only 10% of that (.1% of our entire DNA) are introns, the rest being extrons that are of no known value as the sequences they code for will never be translated to protiens as they are removed in the creation of mRNA.

    Frankly, the point is it doesn't matter the race of the people. Only .1% of our DNA codes for _all_ of the attributes that make us different- eye color, skin color, hair color, etc. And we haven't even yet mapped all of these loci.

    --
    -bugg
  3. The sequenced the Genome, not specified it by donutello · · Score: 2

    IANAMB (Molecular Biologist) but I believe what they did here was figure out what the individual pieces of the human genome were.

    (For lack of the ability to come up with a better analogy) It's like if all cities had the same map, then they discovered the road map of cities. It's what is at each address in the city that makes the cities unique and gives them character. They are just telling you where the houses go, what makes people different is what the houses actually are.

    --
    Mmmm.. Donuts
  4. Compression by the_demiurge · · Score: 2

    Of course, people actually downloading the whole human genome probable wouldn't worry about this, but couldn't they use a better compression format than .zip? I bet using bzip2 or rar would shave a couple of hundred MBs off of that 753MB file. Also, the differences in compression techniques would be interesting to see on a large group of files mainly consisting of G, A, C, and T.

    -- demiurge
    You find a file that appears important and obliterate it from memory!!!
    Score one for the downtrodden hacker!

    1. Re:Compression by Jonathan · · Score: 5

      Actually, DNA compression is a topic of interest, not only from the standpoint of saving disk space, but also for analyzing the sequence -- areas that compress differently may have different functional roles. You can read a paper on the subject by some people I know here

  5. licencing by norculf · · Score: 2

    If anyone tried to licence this, I'd hit 'em with a prior-art based lawsuit, based around the fact that I was using the code long before they licenced it. Of course, people born after it was licenced wouldn't have this option. Of course, I could always just make them pay me royalties...

  6. Gotta watch those errors... by Uruk · · Score: 3

    Keep in mind that there is only a 5% genetic variance between monkeys and humans.

    Which means, that unless they checked and double checked this data, if you actually try to compile it into a human, you may end up with a 5-nosed purple haired, blind and deaf armadillo-platypus mix with ESP and a penchant for buggery. :)

    They really do need to GPL this, if for no other reason than for the NO WARRANTY clause.

    --
    -- Truth goes out the door when rumor comes innuendo. -- Groucho Marx
  7. Re:We don't have the "source code" for humans. by Coleco · · Score: 2

    Actually we can 'compile' the code because we know all the enymnes (polymerases) etc that turn the dna into rna then into proteins, and we know how they work.

    So the higher level building block would be the resultant amino acid sequence... Then the functional protein..

    'Random' is not a good word to use when talking about genetic material. What influences that decide what information carries on through the generations isn't a result of randomness.. Although base-pair mutations do happen in a generally 'random' way. If a protein is rendered non-functional due to a mutation the other copy of the gene is still functional. Hence non-functional or seeming detrimental genes are carried through time until either they turn out to be an advantage at some point, or a whole new trait emerges. If you study genetics for some time you begin to realize that 'mistakes' or abberations are the pool by which functional innovations occur. Judging genetic fitness on a basically arbitrary basis i.e. anyone saying one trait is the prefered or 'correct' trait are demonstating their misunderstanding how evolution works. There are numerous examples of this. There's the classic sickle-cell enemia example, and those people that carry a defect in a specific protien so the hiv is unable to attach to and infect their cells.. hence, they are immune.

    Due to the billions of individuals.. Aside from other factors, randomness occurs within the context of the individual but not within the context of the popluation. Populations and species evolve or remain at a genetic equilibium for very systematic reasons.

  8. Source for Human Beings by zeck · · Score: 2

    We finally have the source for human beings. Now if only they'd GPL it.

    Yeah, too bad they didn't comment it.

  9. No, it's not the source code. It's the binary. by yerricde · · Score: 2

    The DNA of a human being is the binary. Source code is normally commented, and that's what they're working out. They've sequenced the genome (== dumped the binary); now they're mapping it (== running a debugger, disassembling, and commenting the source).

    --
    Will I retire or break 10K?
  10. Oh-oh... GPL restrictions... by emac · · Score: 2

    Would that mean if I decide to have a kid (thereby utlizing and modifying my genes, and agreeing to the GPL) but then later give the kid up for adoption (redistributing the binary) I'd have to include a full copy of the source code? (Fully sequenced genome for the kid)

    That could get expensive! :)

    --
    Best new white rapper since Pimp Daddy Welfare... Pimp-T!
    1. Re:Oh-oh... GPL restrictions... by Squeeze+Truck · · Score: 2

      but I prefer to modify my genome...er...differently.

      Sunbathing?

      --

      "Reactionaries must be deprived of the right to voice their opinions; only the people have that right." - Mao

    2. Re:Oh-oh... GPL restrictions... by Squeeze+Truck · · Score: 2

      And that modifies your genome?

      Holy Shit! No wonder my grades have been dropping! I need to start dating smarter girls!

      ...and girls with smaller breasts.

      --

      "Reactionaries must be deprived of the right to voice their opinions; only the people have that right." - Mao

  11. Contig Assembly -- a mere hack isn't enough by Jonathan · · Score: 5

    Although the article doesn't really explain it, what this programmer did was write a contig assembly program -- a program that tries to find the most likely ordering of the fragments in the raw sequence data.

    While it is very impressive that a programmer was able to write a contig assembly program in four weeks, and that it only took three days to assemble the entire genome, I really doubt that this particular assembly of the genome is going to be definitive. People like Gene Meyers and Phil Green have devoted years to developing such programs, and I think the results of their programs, although probably taking more than three days to run, are likely to yield more accurate results.

  12. Re:It's not done yet.... 21.1% as of July 7, 2000 by Coleco · · Score: 2

    DNA that is transcribed into RNA and hence into proteins is read off of only one strand. There are recognition sequences to determine the beginning and end points of transcription. Also DNA has a 'direction' 5'->3' so you can tell which direction is which. I'm sure whatever ind of software the transcription guys use recoginzes this.

    As for second point our DNA is for the most part almost identical. We need to have 99.9% of the same parts in order to run properly, i.e. everyone's gene sequence for hemoglobin is the same. So there is almost no variation in most of the genes in which mutations would be lethal or at least very bad. The genes that do allow variation.. i.e. eye color, are of a very small percentage and variation is allowed within those genes..

  13. Genetic stuff as Intellectual Property by Stephen+VanDahm · · Score: 2

    Bruce is joking (he's joking, right?), but the question remains: can people really patent and copyright this kind of stuff (genes, and the like)? I can see how it is valuable intellectual property, so someone is bound to try it, but on the other hand, you don't invent anything, you just discover it.


    ========
    Stephen C. VanDahm

    1. Re:Genetic stuff as Intellectual Property by Tet · · Score: 2
      can people really patent and copyright this kind of stuff (genes, and the like)?

      Yep, they can (and do). The logic being that it costs a lot of money to do the research, and that without patent protection, they wouldn't be able to recoup that investment. This is, in fact, pretty much what patent law was designed for in the first place -- to stimulate progress by providing financial incentives to do so. The only problem with this theory is that they're patenting things that they didn't invent. And in the case of the human genome, the stimulating progress argument doesn't hold. The HGP was doing the work already. IMHO, the human genome is too important to allow any company to control.

      --
      "The invisible and the non-existent look very much alike." -- Delos B. McKown
  14. Re:What Possible Use Would Anybody Have For This? by fatdave · · Score: 2
    OK, I have downloaded the genome, indexed it and have it available for my users.

    The latest full release of EMBL (63) weighed in at about 4.7 Gb compressed. This took me about 30 hours to download.

    GPL'd tools are available. Checkout EMBOSS for a start, BioPerl, BioJava, bioPython, and BioXML, all linking in with a common biocorba interfaces, and many more besides.

    I run my bioinformatics service with a minimum of commercial software (only one commercial package which I am soon replacing with EMBOSS, and several non-open packages. The majority are open to some degree.

    Needless to say it is based on Unix systems (IRIX/Linux in my case).

    ..d

    --
    --- Four bases should be enough for any genetic code
  15. Re:Argh, everything is NOT open source by Imperator · · Score: 2

    My greatest fear is that the genome will be modified to create more zealots for /.. Imagine a race of superhumans capable of posting "Foo should be GPLd" messages as fast as today's trained apes are posting "First Post" messages. There'll be license jokes, insensitive "treat AOL users like dirt" jokes, Microsoft jokes, bad trolls moderated up as funny, good trolls moderated down as flamebait, and flamebait moderated up as insightful. It'll be July 2000 all over again! We must take action to prevent this!

    --

    Gates' Law: Every 18 months, the speed of software halves.
  16. Re:About 1500 MBs by Animats · · Score: 3
    ow much data is in a genome...
    Human DNA is roughly a gigabyte. It's interesting that the download, compressed, is also about a gigabyte.

    Now we have the object code. Much of the rest of this century will be spent trying to disassemble and comment it.

  17. Re:What Possible Use Would Anybody Have For This? by ShamballaJones · · Score: 2

    For most of us it's not a lot of use, I'd agree. Howver, if you're a specialist researcher in numerous biological, medical fields (and possibly anthropology, archeology, geneology and others as well) this stuff is a potential goldmine.

    Provided you have the software to mine it of course.

    Putting the genome in the public domain is a great start but to make it truly accesable requires freely-availible (i.e. open source / GNU / FSF) tools with which to explore it. Otherwise, as you observed, it's pretty difficult to follow.

    My guess is that if those tools appear then one day not too far away kids in highschool will do lab exercises in biology class that involve cloning genes and so forth(*). That may seem far-fetched but I suspect that we're witnessing a nascient technological revolution at about the stage that the current "computer revolution" was in when a bunch of geeks were doing apparently pointless things with the original Altair.

    * If OS/GNU tools don't turn up most schools aren't going to be able to afford the tools - so no labwork.

    --
    [ Blairism is the continuation of Thatcherism by other means. ]
  18. Re:No need to GPL, it's public domain by SEE · · Score: 2

    Under the Universal Copyright Convention (Berne), a copyright notice is not required. As most nations are members of either or both the UCC and WTO (which requires adherence to the UCC), there are only a handful of nations on Earth where a copyright notice can possibly be required, and in few of those is there any actual recognition of copyrights at all.

    Steven E. Ehrbar

  19. Re:It's not done yet.... 21.1% as of July 7, 2000 by IAmSancho · · Score: 2
    Of course, once done, they will have a map for one person, not everyone.

    This is not correct. The privately funded Celera used DNA from several individual sources. Every normal human has the same set of genes (genes are fragments of DNA that are translated into functional proteins). Variations in this are due to mutation and can cause inherited or spontaneous disorders such as cystic fibrosis or marfans. Where we are different is in what's between the genes (composed of random junk and tandem repeats). When forensic scientists use DNA evidence to connect a suspect to a crime, they are not actually comparing the DNA base pair-for-base pair (that is to say, they're not looking at the A's, the T's, the C's, and the G's). Rather, they compare the lengths of fragmented DNA from two sources fragmented by the same enzyme(s). I digress. Basically, since the genes, though they make up a relatively small portion of the whole length of a strand of DNA, are what give us our fundamental human characteristics, and they are basically the same for everyone, the efforts of the Genome projects will produce a one size fits all product.

    --
    -------------------------

    Stupid people suck.

  20. No robots.txt! by Pseudonymus+Bosch · · Score: 2

    There is no http://genome.ucsc.edu/robots.txt. And we are talking of an enormous database.

    You'd better keep your robots off the site.
    __

    --
    __
    Men with no respect for life must never be allowed to control the ultimate instruments of death.
    GW Bu
  21. About 1500 MBs by HerrNewton · · Score: 2

    Truly impressive when one realises that the compressed files alone weigh-in at just a bit over 1500 MBs. Has anyone actually downloaded and unzipped the files? I'm not looking for laughs or a troll here, but I'm honestly curious just how much data is in a genome... a viable storage medium, perhaps? I'm serious.

    ----

    --

    ----
    Am I the only one who thinks Microsoft is a misnomer? Perhaps Macrosoft would be a better fit?
  22. If they would only GPL it... by dcs · · Score: 2

    Yeah, that would give a whole new meaning to the expression "GPL virus". I can already picture it:
    I create virus, GPL it's DNA, and then release it. You get the virus, get contaminated, and now you are required to release your DNA specs for everyone to see! Yeah! Go, GPL, go!

    --
    (8-DCS)
  23. Patent your genome before it's too late! by Bruce+Perens · · Score: 3
    An oft-overlooked characteristic of the U.S. patent office is that they gladly accept models to support a patent claim. So, I suggest that Slashdot readers fill out the form and send in a test-tube to support their claim. What to put in the tube is left as an exercise for the reader :-)

    Bruce

  24. Whose copy is it? by Frac · · Score: 2

    Since everyone's DNA is different, how do we determine which part is what makes us unique, which part is changable (to a certain extent to make us humans "compatible") and which part is disposable DNA? Was the genome a replica of a caucasian, asian, black, or a bit of each?

    Go get your free Palm V (25 referrals needed only!)

  25. Apache by CMU_Nort · · Score: 2


    So how long until someone writes mod_human for apache?

    Although the benefits of embedding a human in your web server are dubious.

    --
    --------- Beware the dragon, for you are crunchy and good with ketchup.
  26. race and genetic diversity by TheDullBlade · · Score: 2

    race accounts for less than half of the the genetic difference

    Isn't that what you'd expect? In fact, I'd be rather surprised if it was anywhere close to half.

    Not only can the races interbreed with complete success, there are morons and geniuses, weaklings and strong men, over roughly the same large spread in each race. To me these facts alone suggest that there should be far greater diversity within races than between them.

    However, I don't take this to mean that racial differences are necessarily insignificant or uninteresting, though one should naturally expect all but the most blatantly obvious to be lost in the variety of individuals.

    But isn't the genome the complete set of genes for the species? Not the genes of one man, but the total genetic catalog of all mankind? If so, the question "Which man?" (to which you replied) is nonsensical.

    --
    /.
  27. What Possible Use Would Anybody Have For This? =) by citizenc · · Score: 2

    I'm not looking to troll here, but, honestly, what possible reason, (besides being able to bring a chick over to your house and say 'hey baby, wanna see my source code?') would anybody have to download 1500 megs? I dug around the site, and I found a sample of what is contained in those mammoth files; you can check out what's contained within the zips here .

    Again, I'm not looking to troll here -- I'm just curious, that's all. =)


  28. GPL'd Genes license question by kevin805 · · Score: 2

    If I GPL my genes, then later, my children want to become cyborgs, would they be violating the license if they were to use implants that weren't GPL'd?

    Would I be better off releasing my genes under the LGPL?

  29. Re:What Possible Use Would Anybody Have For This? by FigWig · · Score: 2

    There are a TON of freely available bioinformatics type programs available, a good start is to browse Biocat. Of course to get any use out of these programs you should have some knowledge of biology & computational biology. Traditionally academic software is very open, though not GPL (I'm beginning to hate all you GPL-wanting whining fuckers (this is not necessarily directed to the poster I am replying to)). An unfortunate trend as of late is servers which provide an application but no binaries to run locally, and no code. Not very scientific if you ask me. Also not helped by the GPL.

    one day not too far away kids in highschool will do lab exercises in biology class that involve cloning genes and so forth

    In high school I took Advanced Placement Biology (suppose to be equiv to an intro college course) and one of the labs was to introduce a plasmid into E. Coli so that it became immune to ampicyllin, an anti-biotic. Genetic experiments are definitely possible at the high school level, it's just a matter of getting the expensive machines and specialized knowledge. Maybe a schoold district could put its money into a couple PCR machines and a knowledgable lab tech?

    --
    Scuttlemonkey is a troll
  30. Argh, everything is NOT open source by Hrunting · · Score: 3

    Now if only they'd GPL it.

    Geezus, why does everything have to be related to open-source software? We're not dealing with software here, folks, no matter how many analogies you want to make.

    Guess what, the human genome is better than GPL'd. It's completely free. If you alter it, you have a copy of the new code right in the genes. We did majority of the work on decoding the genome in the last 2 years. Decoding is practically trivial now, and the finished product carries with it the code that made it.

    Everything is not software, and not everything should live by the rules of software. I personally would love to stop hearing talk about licenses with respect to the human genome and start hearing talk about the responsible use of the code. My greatest fear isn't that someone will modify the genome to create a superhuman and then not tell anyone what they did. My greatest fear is simply that the genome will be modified at all.

    There's a fine line between advocacy and zealotry

  31. Re:Genome online? by slashdoter · · Score: 2

    Can you imagine if M$ tryed to make a DNA replicator. Can you say blue chromosome of death

    --
    Does anyone actually have a Java program designed to control air traffic, or for the operation of a nuclear facility?
  32. Source released? by Steve+G+Swine · · Score: 2

    Are they accepting diffs?

    --
    "Consider yourself a member of a virtual corporation with Mr. Torvalds as your Chief Executive Officer." - Linux Advocac
  33. What's the big deal? by stm2 · · Score: 2

    I'm donwloading sequences (human included) since 1994 from NCBI web site (http://www.ncbi.nlm.nih.gov/).
    I'm working on sequence analysis to make philogenetic trees in Quilmes University (Argentina).

    --
    DNA in your Linux: DNALinux
  34. Re:No need to GPL, it's public domain by Squeeze+Truck · · Score: 2

    Actually, I hear some biotech companies have acquired the rights to some of the "more interesting" genetic material in places like S. America in exchange for a new family cow or somesuch. I can't recall where I read this, but I ask that you believe me anyway.

    --

    "Reactionaries must be deprived of the right to voice their opinions; only the people have that right." - Mao

  35. Quick by the_other_one · · Score: 2

    Quick download the human genome and spread it around the internet before the RIAA and MPAA try to stop links to it. Humans duplicate copyrighted material. These agencies do not want the information required to build a human to be available.

    We must stop them from eliminating humans in the name of greed.

    There is also a rumour that they are attempting to patent sex.

    --
    134340: I am not a number. I am a free planet!