New Method To Revolutionize DNA Sequencing
An anonymous reader writes "A new method of DNA sequencing published this week in Science identifies incorporation of single bases by fluorescence. This has been shown to increase read lengths from 20 bases (454 sequencing) to >4000 bases, with a 99.3% accuracy. Single molecule reading can reduce costs and increase the rate at which reads can be performed. 'So far, the team has built a chip housing 3000 ZMWs [waveguides], which the company hopes will hit the market in 2010. By 2013, it aims to squeeze a million ZMWs [waveguides] onto a single chip and observe DNA being assembled in each simultaneously. Company founder Stephen Turner estimates that such a chip would be able to sequence an entire human genome in under half an hour to 99.999 per cent accuracy for under $1000.'"
Gattica, here we come!
But at least there's Uma Thurman.
That's, what, 28 incorrect base pairs out of 4000? I'm not a biologist, but is this considered an acceptable error rate? Even the hopes of 99.999% accuracy seems really awful when there are about 3 billion base pairs in a human genome.
I realize that we aren't going to be trying to make a cloned copy from this data, but what uses is this "good enough" for?
If it's for-profit but free, you're not the customer -- you're the product (e.g., the Slashdot Beta's "audience").
You don't want to jump from 150 to 3 billion bases. Read up on shotgun sequencing. The mere fact that a given chunk is 150 bases is immaterial, though of course if you could lengthen that by a factor of five or ten it would improve accuracy and reduce computation on assembling the whole thing.
Headline you'll never see: Have your genome sequenced while you wait. No more than 30,000 error's or your money back!!!
Shop smart, Shop S-Mart.
Sub-$1000 genome sequencing will put the creation of 'designer' kids into the realm of the affordable for much of the middle class. Scary stuff. Now we just need to combine that with cheap and reliable cloning techniques and my plans for world domination will be comlete!
My blog
Abstract:
I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
> Company founder Stephen Turner estimates that such a chip would be able to sequence an entire human genome in under half an hour to 99.999 per cent accuracy for under $1000.
:)
I think this qualifies as a true 'technological singularity'
Is there not some form of error-correction in the sequence itself that could be exploited ?
Something like the error correction on an audio compact disk ?
Nullius in verba
can't slashdot automatically mirror links created in articles? that way they are always readable... slashdotted links are annoying...
Several "next generation" sequencing methods currently produce short sequences, or "tags". 454 sequencing isn't one of them - 454's typical read length is a few hundreds.
see title
Comment removed based on user account deletion
Using 454 sequencing you get average read lenghts of ~400-500 bp. Read lenghts around 20 bp would be pretty much useless. At least for de novo sequencing..
(from Google Cache) Reading DNA sequences from single molecules of polymerase using nanotechnology
Fact: .01% is enough to cause mutation.
I assume that the hardware at Science can withstand a slashdotting better than the crappy blog linked in the summary:
http://www.sciencemag.org/cgi/content/abstract/323/5910/133
Does anyone remember the movie, "GATTACA"?
1/2 hour for $1000, eh? And in another 5-10 years we'll cut that in half or more, both time and cost. It looks like the instant gene sequencing tech from GATTACA will be with us in most of our lifetimes. But even with this announced breakthough it'll be functionally the same.
Reminds me of that "Lost in Space" or whatever it was called remake. Terrible movie but remember the scene where they are fighting the spider-things and they slap down a chunk of one onto a machine which pretty much instantly reconstructs the full organism and then goes on to suggest ways to fight it based on how its built? Yeah, this could lead to one of those machines being reality.
Shh.
Since this technique should be a shoe-in for the Archon X Prize.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
[citation needed]
I work in this field (our lab has 2 Illumina sequencers) and unfortunately the article is slashdotted. The single molecule stuff the summary seems to be talking about is from Pacific Biosciences (with a few hundred/thousand pools). These guys are well funded - their backers include Kliner Perkins Caufield and Byers [sp?] (a historic VC firm) and I'm hearing stories how they're turning down offers for more funding. They're probably going to be the first of the 3rd generation sequencing technologies. Next-gen sequencing has been in the news a lot lately - if you have access, check out some of the recent sequencing papers in Nature and Science. In any case, there are quite a few competing technologies. There seems to be a lot of talk about different error rates, but in reality the error introduced in the sequencing really depends on the technology. 454 aims for longer reads at the cost of fewer reads - the data I've seen gives a few hundred thousand 250-350 base reads per run of the machine. The Illumina sequencer we use gives us short 36(now longer) base reads, but we get 60 million of them. There are also other technologies which use similar fluorescence like the SOLiD system and the open-source Polonator developed in George Church's lab. All these 454, Illumina, AB SOLiD, etc have been out for at least 18 months now.
I guess they didn't have the foresight to use a real host.
I'll be here all night folks try the steak.
I believe the true benefit of this technology will not be for cloning, but for general medicine. For example, you would go to the doctor with a lump, and instead of him doing a biopsy, find cancer, chemo, invasive surgery etc etc, they first take your DNA, sequence it and then take the biopsy and identify the origin of the cancer (is the lump actually metastasized from your pancreas?). Then work on resolving the cure just based on your genetic makeup, rather than a shotgun approach. Additionally medicines for genetic problems, and a number of other diseases would be custom-tailored for your genetic makeup. If you are prone to hypertension, your DNA sequence could prove if you carry the genes for that malady. Really what this is about is a revolution in medicine. It's a private company now that is snatching up all the biggest heads in silicon valley - if and when this goes public, it could be an amazing investment.
Namaste
It looks to be inaccessible. Here are the abstract and fulltext links.
If you want a vision of the future, imagine a youtube comments section scrolling - forever.
DOI: 10.1126/science.1162986
unfortunately for some, that will prove that our spirit is our outstanding feature.
Forensic genetic identification currently uses about 60 important genetic markers. Thats good enough to convict in a court law since the the chance of a duplicate may be less than a billion to one depending on marker combination.
Although humans differ from one another in about 0.1% base pairs for a total of 3 million, the number of difference that describe human variability may be vastly smaller than this. First you discard non-coding DNA which gets you done to 30,000.
The only Science article on this topic was published in 2006: http://www.sciencemag.org/cgi/content/summary/311/5767/1544. It won't be new news again until the product ships - supposedly in 2010
Shotgun sequencing depends heavily on supercomputer. Thats a thousand-fold every 15 years right there. Multiply that by more intelligent software, understanding of genetics, and sequencing hardware, you may be squaring that rate.
Typically the minimum goal is 1 error / 1000 bases for sequencing a new genome. Current sequencing has a much higher error rate though, so each region must be sequenced ~5-7 times to reach that goal. The human genome is at about 1 error per million, last I heard (goal is to decrease that to 1 error / billion by resequencing repeatedly). There aren't many good reasons at this point to sequence entire genomes for individuals, since there are better ways to test for genetic diseases. But for researchers this is very promising.
This is just an historical accident; 99.9% was what could be done with what people judged "reasonable" effort and cost a few years ago; unless you know whatyou are going to use the sequence for, you don't know what error rate is acceptable
There are medical test that rely on dna sequence, eg myriad makes a fortune from sequencing the gene that gives women hereditary breast cancer. I don't know what there claimed error rate would be, but that would show you what is acceptable in todays clinical market place.
As for the "de novo" rate, eg the difference between a child and its parents, or between two identical twins - I don't think this has been accurately measured, but I do believe that single base changes (eg AATTC to AAATTC) are not as comon as insertion or deletion of several bases..which goes to show that biology is ocmplicated beyond belief
on the overview side, for those of us who follow this, pacbio has been hyped beyond belief, and the production of actual data eagerly awaited; time will tell if single molecule sequencing is to be the wave of the future, helicos bioscience (HLCS) has had a very hard time selling its system, which has been on market for about a year now; for instance, one problem is "blinking" single molecules that are fluorescent can go into a "dark" state where they don't give a signal - which is kind of an obvious show stopper
One base-pair does not a gene make.
If fate makes you a motorcycle, you become a motorcycle.
While they may have only recently published the article, people in bioinformatics have been going crazy about Pacific Biosciences for at least a year.
I recently went to a series of talks on Next Generation Sequencing, and there was an interesting chart that showed that when you factor in sequencing cost, read length, and accuracy, high throughput sequencing is actually *outperforming* Moore's law by a factor of 5 or so!
Regarding the error rate, just a few years ago, 454 had error rates of almost 5% but with redundancy it became negligible. Since then, error rates have gone down dramatically.
Also, an Anonymous Coward up there is wrong -- (0.993)^3 is *less* than 0.993. It should be (1 - 0.993) vs. (1 - 0.993)^3. I don't know why it's been modded to informative. Check your math!!
Company founder Stephen Turner estimates that such a chip would be able to sequence an entire human genome in under half an hour to 99.999 per cent accuracy for under $1000
Does that mean that the chip costs $1000 or that each human genome processed costs $1000?
99.3% of your bases are belonging to us!
[Suspects] that come up positive can ask for the more accurate test.
Umm... kind of like getting a lawyer for free if you need legal representation and lack funds, if you come up positive, it should be the default that they run the more accurate test.
It's common practice on Slashdot to read the article before posting.
You're new here, aren't you?
http://www.sciencemag.org/cgi/content/abstract/323/5910/133
Abstract:
We present single-molecule, real-time sequencing data obtained from a DNA polymerase performing uninterrupted template-directed synthesis using four distinguishable fluorescently labeled deoxyribonucleoside triphosphates (dNTPs). We detected the temporal order of their enzymatic incorporation into a growing DNA strand with zero-mode waveguide nanostructure arrays, which provide optical observation volume confinement and enable parallel, simultaneous detection of thousands of single-molecule sequencing reactions. Conjugation of fluorophores to the terminal phosphate moiety of the dNTPs allows continuous observation of DNA synthesis over thousands of bases without steric hindrance. The data report directly on polymerase dynamics, revealing distinct polymerization states and pause sites corresponding to DNA secondary structure. Sequence data were aligned with the known reference sequence to assay biophysical parameters of polymerization for each template position. Consensus sequences were generated from the single-molecule reads at 15-fold coverage, showing a median accuracy of 99.3%, with no systematic error beyond fluorophore-dependent error rates.
Here is an article in New Scientist about the new process. It explains it fairly well and even defines what a ZMW is.
'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
Many people seem concerned about the reading error rate. However, as it's been pointed out, it should be easy enough to read a DNA sequence multiple times (or read the whole genome multiple times) to decrease the error rate significantly. If you have one chip that can read the entire human genome in 30 mins, you can have the same chip read it twice in an hour, or four chips reading four copies in 30 mins.
Furthermore, if you're using a technique like this to map a person's genome, you can be clever about it. Base pairs code genes, which is something you can take into account. For example, if you're reading the eye color gene, and your machine somehow consistently makes mistakes in that area, you can compare your reads to the few possible known eye color genes, and pick the most likely based on the genetic sequences of the entire gene.
G A T T A C A?
The 454 pyro-sequencer currently produces 400bp reads, not 20bp. Granted, that's still a fair bit shorter than this experimental tech claims, but it's also a commercialized product you can actually buy right now. I think it would only be fair to quote current performance figures.
Larry bagina, is that you?
One of the real advances here is the ability to do this on a single molecule. Existing DNA sequencing techniques all depend on an amplification step, known as ploymerase chain reaction (PCR), in which the DNA is iteratively duplicated (this is done by basically hijacking DNA replication machinery from bacteria). However, PCR introduces numerous biases in the final population of DNA molecules: shorter segments and certain sequences are easier to duplicate than others. As a result, what you end up sequencing is always skewed. This may not be too important when it comes to (re)sequencing a genome, but there are a whole cadre of experimental techniques that use sequencing to investigate regulation and modification of DNA, and here that bias can really skew findings and generate many false positives (things that amplify too easily) and negatives (things that don't amplify well at all).
With a CD, the error you are correcting for is an error on the CD, not in the CD player. With sequencing, it's not the DNA code that fails - it's the machine you're using to read the code that fails. The solution is similar to Reed-Solomon in that you sample every DNA region multiple times, so that you can ignore the "lost data points" of misreads. There is no bulit-in mechanism like that for DNA as a strand alone. General DNA proofreading happens during replication and is done by polymerases that are doing the copying in the first place; there are a slew of other specialized mechanisms for DNA damage repair but they're again separate enzymes and not inherent characteristics of the DNA.
In medicine, the cost of a study, as well as its reliability, availability, and predictive value, enters into the decisions made in clinical management.
Comment removed based on user account deletion
Real applications of this, however, include looking for gene sequences in adults which predispose them to diseases (e.g. breast cancer) and then providing counseling and monitoring commensurate with that risk, a far less expensive effort than monitoring everyone for the same disease, even if they aren't at risk. Also, one could use this on embryonic cells obtained through amniocentesis to screen for hereditary diseases is families where there are risk factors.
That's not how significant figures work with multiplication. .007 has only one significant figure (the 7) .007 * .007 = .00005 which also has one sig fig (5). 1-.00005 = .99995, since the 1 is a known integer value (and therefore has an infinite number of significant figures).
That being said, the parent has an incomplete grasp of statistics in this case, and using this formula is inaccurate. However, I am at work, so I will not do an in-depth statistical analysis here.
However, don't you just HATE it when your date asks you, "Before this relationship progresses any further, I'll need a sample of your DNA!"
I've abandoned my search for truth; now I'm just looking for some useful delusions.
I'm sorry, but I only pay $2000 dollars an hour for one thing. And I don't even really pay that much for that.
I do not believe in karma.
Yet here you are, karma whoring to pay for your kids' schooling.
Considering current sequencing technology generates terabytes of data per day (see the Sanger center), then wouldn't it be efficient to maximize the amount of information per pixel (i.e. per byte)? This method is actually is much worse (orders of magnitude) than the current method. There are many other problems with what they do, but hopefully the cash infusion can last them another 2 years until the write a paper like this. BTW, the say that appropriate camera tech. will be available in 2-5 years, but they're ready now! They might be buying time...
... spells dyslexia.
Just as seeing the moon doesn't require the same amount of effort as landing on it, reading a DNA sequence doesn't mean that selective modification is "just around the corner."
Who said anything about 'selective modification'? Read what you wrote:
Also, one could use this on embryonic cells obtained through amniocentesis to screen for hereditary diseases is families where there are risk factors.
If you can screen for hereditary diseases, you can also screen for desired traits such as hair color, eye color, propensity for obesity (yes, there's a gene for that), intelligence, etc.
In no time flat, we'll have become a race of athletic, attractive, social and congenial people who all get along, but are all little short in the intelligence department.
My blog
Compasses are usually not a big problem at sea--if you wait a bit you can usually see the stars or sun and know N/S.
The problem at sea is time. It is vital to know exactly what time it is so you can know your longitude.
Darwin's vessel, the "Beagle" was a mapping exploration vessel and carried 22 chronometers.
I'm not sure how that works, since I've noticed that these days, when we have the ability to generally 'synchronized time signals, and everyone has at least one and usually 2 or more clacks available, the probability that any 2 of them agree seems pretty low.
Of course, for most everyday life we could do fairly well by knowing time to the 1/4 hour. Excess precision available does not mean that it is desirable.
wizodd
Before slashdot posts a science story it should check to see if the submitted story is accurate. The Ti pyrosequencer by 454 has a Q20 read length of 400 bases, not 20 bases. The 99.3% read accuracy described in the story is at 15-fold coverage. This means that the raw base accuracy (sequencing a template once) is much lower than 99.3%. This is in comparison to the Applied Biosystem's SOLiD instrument, which has a raw base accuracy of excess of %99.94.
454 has read length of 250 and the GS FLX titanium, which is the upgrqde, has a 400 length read.
Accuracy is not an issue, just do multiple coverage, even on 454 30-40x coverage is the standard.
20 long reads is on the Illumina. Shorter but more reads and cheaper/base.
Article is rubbish just as the comments below the article. Has anyone of you smart biologists ever seen a DNA sequence?
Upon request, i can send you the dataset from 1 454 run. (kidding, of course not, intellectual property, and it is too big for your littel brrains to analyse).
And secondly the pacific biosciences article has been published more than a month ago, so you guys are late. 20 nov to be correct: http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed&cmd=Retrieve&list_uids=19023044
btw the golden standard is the Helicos now. But toooo expensive 3 mil whereas a 454 or Solexa is around 500k; Enjoy
but who wrote this and how accurate is it - 454 has been giving reliable 220 to 240 nt reads for over a year (done 2 genomes this way at 50x coverage) and the new version does 400+, the new cells do many more simultaneous reads. Even solexa new modules do 40+ nt. So, while this sounds like serious progress (not sure yet - need to read the article) - its not quite the orders of magnitude suggested by the person posting this.