1

Let's assume I have a BAM file and several positions that I would like to examine more closely in this alignment. My goal is to find out whether these positions are on the same reads and which nucleotides each read has in these two positions. So my output should be a table of all reads that occur in the positions and list the associated nucleotides.

The output should look like this (with 980, 1350 and 1400 the first three positions; I made up the read names):

read name 980 1350 1400 ...
ReAdN4m1L03 A T ...
ReAdN4m2L02 A T G ...
ReAdN4m3L01 T G ...
... ... ... ... ...

At the end I would like to have a table that links the nucleotides via the reads.

So far, I have tried to read in the Bam file at the desired positions:

#Positions of interest
pos <- c(980, 1350, 1400)

#path to BAM file path <- pathToBamFile

#name of reference used for mapping refname <- "myreference"

#loading the bam file bamFile <- BamFile(path) gr <- GRanges(seqnames = refname, ranges = IRanges(start = pos, end = pos)) params <- ScanBamParam(which = gr, what = scanBamWhat()) aln <- scanBam(bamFile, param = params)

With the following code I can access the reads, but how do I get the corresponding nucleotides in the individual positions?

pos.no <- c(1:length(pos))

i <- 1 aln[[pos.no[i]]]$qname

I have already tried to work with the CIGAR strings aln[[i]]$cigar with the sequences aln[[i]]$seq in order to extract to the nucleotides manually, but that led to more and more problems.

Is there any easy way to extract the nucleotides?

I have searched for a long time now on the internet for an easy solution but without success. Hopefully someone will help me here! Many thanks for your time in advance.

wejt
  • 11
  • 1
  • Do you have to use R? This is more convenient if you use the pileup interface from pysam (perhaps Rsamtools provides a similar functionality). – Devon Ryan Aug 21 '21 at 07:41
  • @DevonRyan but does pileup give you an output where the nucleotides are linked to the reads? – wejt Aug 22 '21 at 06:35
  • Yes, it does, at least in non-R interfaces (i.e., python and C). – Devon Ryan Aug 22 '21 at 08:39
  • 1
    This answer does it in Python, perhaps there are some ideas there: https://bioinformatics.stackexchange.com/questions/7401/access-base-aligned-to-particular-reference-position – Maximilian Press Aug 29 '21 at 22:18

0 Answers0