Read lengths of 4000 with 28 miscalled bases would be awesome, totally awesome. Current "next-gen" sequencers typically generate 100x coverage over the entire genome so, to use a puzzle analogy, it's like having enough puzzle pieces to build 100 complete puzzles. This coverage lets you find and correct sequencing errors to generate a "consensus sequence." So if you can successfully identify 100 code fragments covering the same piece of the genome and 8 have an "A", "G", or "C" in position 11 but 92 have a "T" in position 11, you can be reasonably confident that the consensus call of "T" is the correct nucleotide base and the other calls are sequencing error. The read length of 4000 as opposed to say, 25, lets you span areas of tandem repeats, found in certain organisms, that confound assembly algorithms and give you bad assemblies. Hope this helps.
Read lengths of 4000 with 28 miscalled bases would be awesome, totally awesome. Current "next-gen" sequencers typically generate 100x coverage over the entire genome so, to use a puzzle analogy, it's like having enough puzzle pieces to build 100 complete puzzles. This coverage lets you find and correct sequencing errors to generate a "consensus sequence." So if you can successfully identify 100 code fragments covering the same piece of the genome and 8 have an "A", "G", or "C" in position 11 but 92 have a "T" in position 11, you can be reasonably confident that the consensus call of "T" is the correct nucleotide base and the other calls are sequencing error. The read length of 4000 as opposed to say, 25, lets you span areas of tandem repeats, found in certain organisms, that confound assembly algorithms and give you bad assemblies. Hope this helps.