Pine Tree Has Largest Genome Ever Sequenced
sciencehabit (1205606) writes "Using a single pollinated pine seed, researchers have sequenced the entire genome of the loblolly pine tree--and it's a doozy. The tree's genome is largest yet sequenced: 22.18 billion base pairs, more than seven times longer than the human genome. The team found that 82% of the genome was made up of duplicated segments, compared with just 25% in humans. The researchers also identified genes responsible for important traits such as disease resistance, wood formation, and stress response."
It just has to be said - they need to figure out how to integrate that wood formation sequence into human genes, before I get much older.
Yes, I know. But it *did* have to be said.
Witty signature omitted for brevity.
Aye, go with the phloem, I always say.
Do not mock my vision of impractical footwear
All those genes make humans look like flunkies. And knowing a tiny bit about Darwin maybe we could take into account that a pine tree can easily outlive any human ever born. And pine trees tend to have a very long history of reproduction compared to humans. So maybe all the thinking, feeling and running about that humans do is simply proof of our inferiority. think about it. The pine tree needs water, sunshine, a few minerals and an atmosphere and that is about it. Humans need all kinds of things. I've never seen a tree shoot anyone, go mental, or rape other trees. Trees might enjoy making humans feel like idiots.
I'm not surprised. trees and plants were here before we were.
I'm here for the experience, not the Hyperbole.
The team found that 82% of the genome was made up of duplicated segments, compared with just 25% in humans.
See! The pine trees are smart and make multiple copies of their genome segments, for backup purposes. Humans always forget the importance of backups, until it's too late.
Soon I shall imbue the soles of my feet and grow pine shoes.
Spent All My Mod Points
63x total coverage with from Illumina hardware using a mixture of paired-end libraries, ranging from 200 bp to a whopping 40 Kbp. I'm pretty sure that's sufficient information to estimate the number of large-scale repetitions. Sequencing projects of species for which there is no good relative to scaffold against are typically much more rigorous than what you'd see in cancer research.
Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
That's the largest genome that's been fully sequenced, not the largest genome known. See Comparison of different genome sizes. Genome sizes for plants vary over a huge range, and aren't closely related to organism complexity. The largest genome known is for an amoeboid.
Well, Boss, we have good news and bad news:
- the good news: we've got it up!
-the bad news: it's tossing off splinters!
Break to chorus of "Hurts So Good!" by John Mellencamp.
You have to pay attention to the fact that they were here before we were.
We would not be able to breathe if they were not here!
Wake up!
I'm here for the experience, not the Hyperbole.
The team found that 82% of the genome was made up of duplicated segments,
-funroll-loops
Well they do have a draft genome, not a "complete" one. A complete genome is really hard to generate, and doesn't really gain you a whole lot for all your effort for more complex organisms. Also, its not fair to compare cancer research, as they already have one of the best genomes sequenced to refer too, the human genome. Creating a new genome, de novo, is hard, and 63x is a good start, but not nearly enough.
Also, why did they just use Illumina? Yes it's nice they had multiple paired end ranges, but Illumina is typically only short reads of around 100bp. Generally, throwing in some PacBio sequences helps with the scaffolding process with their long reads. You don't need much either, less than 1x is fine.
Also, it looks like they did do some transcription work, but I didn't see anything in the paper detailing what areas of the tissue they took samples from. Hopefully this is well documented so that appropriate expression analysis can be done, instead of simply relying on existing database gene information to determine traits.
The hardware platform of choice is a matter of availability. Here is a map of where most/all of the NGS platforms are in the world; Illumina sequencers are the most common amongst the newer systems.
Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
One, that map is incomplete. Second, there are plenty of facilities, even if not as numerous, that can do other sequencing. As long as the assembly techniques support combining multiple sequencing technologies together, you should in order to call upon each's strength.
For example, look at the All Paths assembler that recommends adding in a touch of PacBio to connect scaffolds together.
(Sure, but PacBio in particular is quite new on the market still. Three years ago they were borderline vaporware!)
And, yeah, most serious sequencing projects I've seen do use a mixture of methods, particularly 454 stuff. But I'm sure they'll switch to IonTorrent and PacBio as opportunities allow.
Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
The codebase is huge, many many billion SLOCs.
But, most of the functions never get called, and the rest is code comments ...
2bits.com, Inc: Drupal, WordPress, and LAMP performance tuning.
Genome annotation (finding all the interest features in the sequence) is really computationally intensive, due in large part to the number of separate (often sub-optimally written) algorithms that have to be chained together and interpreted. My team at the iPlant Collaborative worked with the authors of a popular open-source annotation tool called "MAKER" to get it running at scale on the 302 TFLOP Lonestar 4 supercomputer, which in turn was used by the pine team to do in a few hours what used to be 6 months of painstaking bioinformatics. In another month or so, this algorithm will be available via REST API allowing, literally, "Annotation As A Service".
Your mom has the largest genome ever sequenced!