Chinese Lab Speeds Through Genome Processing With GPUs
Eric Smalley writes "The world's largest genome sequencing center once needed four days to analyze data describing a human genome. Now it needs just six hours. The trick is servers built with graphics chips — the sort of processors that were originally designed to draw images on your personal computer. They're called graphics processing units, or GPUs — a term coined by chip giant Nvidia. This fall, BGI — a mega lab headquartered in Shenzhen, China — switched to servers that use GPUs built by Nvidia, and this slashed its genome analysis time by more than an order of magnitude."
Sounds like these newfangled "GPUs" are gonna change the world.
I always wondered what GPUs are. Thanks Slashdot!
Explaining what a GPU is in a slashdot summary? Come on.
This is similar to someone telling you a story about something funny happening to them while shopping at the store, pausing mid-story to inform you that a 'store' is a business where goods are displayed and exchanged for a papery substance called 'money'.
Submitter couldn't find a more technically-oriented one?
It's hardly news that GPUs can be used to speed up parallel tasks/computations, but even so this article is a useful reminder of two things; 1) there are still many important processes that can be sped up by using GPUs, and 2) this can be achieved pretty much anywhere in the world.
I wonder if the AMD use of more cores, whereas Nvidia uses faster cores, would change the time. I have no idea how genetic algorithms work. I do know simple hashes like bitcoins are best on AMD.
Excerpt:
At BGI, he says, they are currently able to sequence 6 trillion base pairs per day and have a stored database totaling 20 PB.
The data deluge problem stems from an imbalance between the DNA sequencing technology and computer technology. According to Dr. Wang, using second-generation sequencing machines, genomes can now be mapped 50,000 times faster than just a decade ago. The technology on track to increase approximately 10-fold every 18 months. That is 5 times the rate of Moore's Law, and therein lies the problem.
Obviously it would be impractical to upgrade one's computational infrastructure at that rate, so BGI has turned to NVIDIA GPUs to accelerate the analytics end of the workflow. The architecture of the GPU is particularly suitable for DNA data crunching, thanks to its many simple cores and its high memory bandwidth.
This post contains no rudeness or derision of any kind. All arguments are friendly. Terms and exclusions may apply.
So, a site dedicated to nerds needs to explain what a GPU is? Are we not nerds anymore?
Write boring code, not shiny code!
Although at least in my field the problem is that no one ever thought to set lower limits on the quality of what you can call a genome. So now we get "genomes" made up of 100,000 contigs (many only a couple of hundred base pairs long) and even counting all of those, the total sequence might account for only 70% of the total size of the genome. But it's still a "genome" paper, which is still an instant ticket to Nature Genetics (or Nature Biotechnology if the assembly is REALLY bad).
BGI is certainly one of the biggest offenders (Cucumber and Pigeonpea are both examples of the sort of terrible genomes-in-name-only BGI puts out) but I think the real problem is that Illumina sequence data is so cheap people keep trying to use it to sequence genomes, thinking if they throw enough raw data and enough mate-pair libraries at the problem it'll eventually make up for the fact that Illumina reads are so short. Illumina data is great for a lot of things. Calling SNPs, measuring gene expression, studying methylation patterns.
But, at least for any genome significant transposon content, it simply does not work.
... this is what a Chinese lab looks like.
Ah, arrogance and stupidity, all in the same package. How efficient of you. -- Londo Mollari
the Chinese are picking up on the technology and on genomic data mining far faster and with more intensity than is the broader US tech community.
You're forgetting that the vast majority of countries actually developing this technology, and making it available to consumers, are based in the US (and Britain, to some degree). One recent article about the BGI that I read last year noted the irony of seeing several crates of sequencing machines stamped "MADE IN THE USA" waiting to be unloaded in Shenzhen. The Chinese government is certainly willing to spend large amounts of money advancing their capabilities, but I haven't seen any evidence that they're significant surpassing the US in anything other than sequencing capacity. (And the machines they're using are very good for generating large quantities of data, but the quality of said data is somewhat suspect.)
Given the size of their brainpower base and the rate at which they are adapting the technology the Chinese are well on their way to dominating the drug development and physiological/functional genomic sciences in the next 10 years.
Except that genomics has as of yet proven minimally useful for drug development. Until they actually develop significant amounts of homegrown technology (which, to be fair, they are actually doing in the bioinformatics arena, as opposed to sequencing), I'm not convinced that they're that much of a threat. What they will certainly accomplish, I think, is a record of high-profile scientific output and the ability to compete on even terms with the rest of the industrial superpowers. No mean feat considering where they were 40 years ago, and certainly some cause for concern given their large and inexpensive labor force, but it's not the same thing as suddenly eclipsing the USA in technology that they're still mostly importing or stealing.
A search of Usenet reveals the Atari Jaguar had a unit called a "GPU" in 1993, considerably before NVIDIA's "first GPU" in 1999. The Amiga unit was also called a GPU.
The term's generic, and NVIDIA knows it... they don't have it registered as a trademark.