Slashdot Mirror


Chinese Lab Speeds Through Genome Processing With GPUs

Eric Smalley writes "The world's largest genome sequencing center once needed four days to analyze data describing a human genome. Now it needs just six hours. The trick is servers built with graphics chips — the sort of processors that were originally designed to draw images on your personal computer. They're called graphics processing units, or GPUs — a term coined by chip giant Nvidia. This fall, BGI — a mega lab headquartered in Shenzhen, China — switched to servers that use GPUs built by Nvidia, and this slashed its genome analysis time by more than an order of magnitude."

4 of 408 comments (clear)

  1. A better article by arielCo · · Score: 4, Informative
    http://hpcwire.com/hpcwire/2011-12-15/bgi_speeds_genome_analysis_with_gpus.html

    Excerpt:

    At BGI, he says, they are currently able to sequence 6 trillion base pairs per day and have a stored database totaling 20 PB.

    The data deluge problem stems from an imbalance between the DNA sequencing technology and computer technology. According to Dr. Wang, using second-generation sequencing machines, genomes can now be mapped 50,000 times faster than just a decade ago. The technology on track to increase approximately 10-fold every 18 months. That is 5 times the rate of Moore's Law, and therein lies the problem.

    Obviously it would be impractical to upgrade one's computational infrastructure at that rate, so BGI has turned to NVIDIA GPUs to accelerate the analytics end of the workflow. The architecture of the GPU is particularly suitable for DNA data crunching, thanks to its many simple cores and its high memory bandwidth.

    --
    This post contains no rudeness or derision of any kind. All arguments are friendly. Terms and exclusions may apply.
    1. Re:A better article by Samantha+Wright · · Score: 4, Informative

      ...countering this stunning and exciting revelation is BGI's stunning and exciting reputation for producing stunningly and excitingly low-quality raw data from said stunning and exciting second-generation sequencing machines. This is a little like the biology equivalent of being told that your least-favourite Slashdot editor (please pick just one) has just gotten a brain implant so he can spam the front page with dupes, typo-ridden summaries, and fallacy-laden opinion pieces ten times an hour.

      --
      Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
  2. Re:This article is almost painfully dumbed down... by Zakabog · · Score: 5, Informative

    The summary is pulled directly from the top of the article.

    Here's the article from HPC Wire and some details from nvidia as well as the nvidia press release

  3. Part of the problem is Low Standards by MaizeMan · · Score: 3, Informative

    Although at least in my field the problem is that no one ever thought to set lower limits on the quality of what you can call a genome. So now we get "genomes" made up of 100,000 contigs (many only a couple of hundred base pairs long) and even counting all of those, the total sequence might account for only 70% of the total size of the genome. But it's still a "genome" paper, which is still an instant ticket to Nature Genetics (or Nature Biotechnology if the assembly is REALLY bad).

    BGI is certainly one of the biggest offenders (Cucumber and Pigeonpea are both examples of the sort of terrible genomes-in-name-only BGI puts out) but I think the real problem is that Illumina sequence data is so cheap people keep trying to use it to sequence genomes, thinking if they throw enough raw data and enough mate-pair libraries at the problem it'll eventually make up for the fact that Illumina reads are so short. Illumina data is great for a lot of things. Calling SNPs, measuring gene expression, studying methylation patterns.

    But, at least for any genome significant transposon content, it simply does not work.