Exon-exon junctions: compare experimental transcripts to reference annotation

Question

My aim is to parse an experimental transcript set (obtained by RNAseq) to check which splice junctions are already reported in a reference annotation and which ones are new.

I tried Regtools:

feed the experimental alignment file (BAM: aligned transcripts to reference genome) to Regtools junctions extract
use Regtools junctions annotate to compare with the reference annotation (GTF)

Exact commands I used for this are:

[..]/regtools/build/regtools junctions extract aln_sorted.bam -o regtools.bed - where the sorted BAM file has index in the same folder
[..]/regtools/build/regtools junctions annotate regtools.bed [..]/ref/genome.fa [..]/ref/annotation.gtf -o regtools_annotate_out

The alignments have been generated with Minimap2.

It did not behave as expected: this gives me as output a file with only junctions annotated as 'new' ($11==N), while other annotations should be possible according to the manual:

DA - This exon-exon junction is present in the transcriptome provided by the user(GTF) The ends of this junction are hence known donor and known acceptor sites according to the GTF file.

NDA - This exon-exon junction is not present(novel) in the transcriptome provided by the user(GTF) The ends of this junction are known donor and known acceptor sites according to the GTF file.

D - This exon-exon junction is not present(novel) in the transcriptome provided by the user(GTF) The donor of this junction is a known donor but the acceptor is novel.

A - This exon-exon junction is not present(novel) in the transcriptome provided by the user(GTF) The acceptor of this junction is a known acceptor but the donor is novel.

N - This exon-exon junction is not present(novel) in the transcriptome provided by the user(GTF) The ends of this junction are hence not known donor/acceptor sites according to the GTF file.

My questions are:

am I doing something wrong?
do you have any experience with Regtools and can help debugging this behavior?
do you have other suggestion to achieve the output I am seeking?

user172818 · Accepted Answer · 2018-02-12T14:52:21.027

You can use the script that comes with minimap2:

# install the k8 javascript shell
curl -L https://github.com/attractivechaos/k8/releases/download/v0.2.4/k8-0.2.4.tar.bz2 | tar -jxf -
cp k8-0.2.4/k8-`uname -s` $HOME/bin/k8    # assuming $HOME/bin in your $PATH

# acquire the latest minimap2
git clone https://github.com/lh3/minimap2.git

# print novel/erroneous junctions
minimap2/misc/paftools.js junceval -e anno.gtf rna-seq.sam > output.txt

Some example output:

P       read1    4       chr2    227332588       227340301       [(227332588,227340291)]
P       read2    1       chr9    19376379        19376489        [(19376388,19376493)]
P       read2    3       chr9    19378506        19378707        [(19378514,19378707)]
N       read3    1       chr1    160647335       160647472
N       read3    2       chr1    160647581       160659672

Here column 4–6 gives junctions not in the annotation. Column 7, if not empty, gives the list of annotated junctions/introns that overlap the junction in read alignment. Note that a big fraction of outputted records are errors in read alignment.

Hi @user172818, I have two questions on this: (1) could you please explain what the value in $3 represent exactly - and (2) I had cases where the value in $7 is a list of more than one tuple: do you have an explanation ? — aechchiki, Feb 27 '18 at 10:54
@aechchiki 1) the exon number of the read. 2) an intron on a read may overlap multiple distinct introns in annotations due to alt splicing. The list gives all. — user172818, Feb 27 '18 at 12:49

aechchiki · Answer 2 · 2018-02-12T14:28:27.553

1

For future googlers, Ding et al., 2017 published a review: "Comparison of Alternative Splicing Junction Detection Tools Using RNA-Seq Data".

Table 1 has the list of software in this review.

edited Feb 12 '18 at 14:28

answered Feb 12 '18 at 13:39

aechchiki

2,676
11
34

Exon-exon junctions: compare experimental transcripts to reference annotation

2 Answers2