3

I have few bam files and would like to get read counts using

samtools idxstats

[Data is aligned to hg19 transcriptome]. To use that command I need a sorted bam file. So to sort them I gave the following command.

samtools sort -T /tmp/input.sorted -o input.sorted.bam input.bam

This ended up showing:

[W::bam_hdr_read] EOF marker is absent. The input is probably truncated.
[E::bgzf_read] bgzf_read_block error -1 after 0 of 4 bytes
[bam_sort_core] truncated file. Aborting.

And the tmp dir has 6 input.sorted.nnnn.bam files.

stack_learner
  • 1,262
  • 14
  • 26

1 Answers1

4

The cause of the error is that your input file is truncated or /tmp is running out of space. If you can do samtools view -H input.bam without error (reading the header also checks for the magic number at the end of the file) then it's like the that /tmp is filling up.

Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
  • I did "samtools view -H input.bam" there is no error. It ended up showing all the header things which I asked here [https://bioinformatics.stackexchange.com/questions/3080/read-counts-from-bam-file/3081?noredirect=1#comment4889_3081] – stack_learner Dec 14 '17 at 14:17
  • Then either /tmp is full or otherwise one of the files there is somehow corrupt (you can test them with samtools view -H too). – Devon Ryan Dec 14 '17 at 14:29
  • Ok. looks like that file is corrupt. And I tried with other bam file and showed - [bam_sort_core] merging from 24 files... and I see a sorted.bam file. I used "samtools idxstats sorted.bam" - This is what I see - [bam_idxstats] fail to load the index. – stack_learner Dec 14 '17 at 15:21
  • Don't forget to samtools index sorted.bam – Devon Ryan Dec 14 '17 at 15:21
  • Ok. So, after the indexing I used idxstats and it showed 4 columns without any headers. NR_039978 984 38 0 NM_130786 1766 1 0 NM_138932 9293 37 0 NM_033110 2678 2 0 NM_000014 4678 4 0 NM_144670 5192 18 0 NR_040112 1201 2 0 NM_017436 2106 76 0 NM_016161 1771 0 0 – stack_learner Dec 14 '17 at 15:33
  • They are (in order): contig name, contig length, number of mapped reads, number of unmapped reads (this will typically be 0). – Devon Ryan Dec 14 '17 at 15:34
  • Ok. Thankyou. So, for DE analysis what I need is mapped reads. how do I get the gene names? Can I use edgeR for DE analysis using the mapped reads? – stack_learner Dec 14 '17 at 15:38
  • You mapped to the transcriptome, so they're the first column. – Devon Ryan Dec 14 '17 at 15:38
  • Yes, I see that. Is there any way to get the gene id/ transcript id from this results. And may I know the answer for DE analysis edgeR? – stack_learner Dec 14 '17 at 15:42
  • If you want to get annotation (like gene symbols) coupled to your accession numbers, please read the biomaRt user guide. If you have questions after reading it, please start a new question in stead of discussing it in the comments. – benn Dec 14 '17 at 16:00