We ran into the same problem when downloading the 1000 genome project BAM files from Short Read Archive . Now, we know that the BAM files are indexed so you can easily retrieve all reads overlapping some portion of chromosome 10 etc. But, do the files really need to be that big? Turns out that with simple run-length encoding and other measures we can cut BAM file sizes in half and we can probably use the same indexing scheme. A writeup on that is on our blog.
We ran into the same problem when downloading the 1000 genome project BAM files from Short Read Archive . Now, we know that the BAM files are indexed so you can easily retrieve all reads overlapping some portion of chromosome 10 etc. But, do the files really need to be that big? Turns out that with simple run-length encoding and other measures we can cut BAM file sizes in half and we can probably use the same indexing scheme. A writeup on that is on our blog.