2

I need some help getting started on this project.

To simplify we want to be able to quantify the occurrence of 3 variants on each sequencing read in an alignment file as a proxy measurement for recombination.

The possible combinations we have are A, B or AB.

For example, variant A is at position 100 and variant B is at position 300 of the reference and AB would have both.

We have alignment files against a reference sequence, and then I get stuck. I could technically look by eye in tablet or IGV for example but is there a way to extract each read then query by expected base at the reference position then bin these reads into the categories; A, B, AB and unclassified then calculate the total occurrences of each?

Thanks for your time and suggestions!

Maximilian Press
  • 3,989
  • 7
  • 23
Bioreeb
  • 21
  • 3

1 Answers1

1

Related question: Access base aligned to particular reference position

After an initial effort posted here, I ended up rewriting this and testing it a little, and posted the script here. It still has some drawbacks (handling indels intelligently) but now it more or less works.

Maximilian Press
  • 3,989
  • 7
  • 23