3

I want a box representing each sequence, positioned as they are in the alignment and with gaps shown as breaks in the each box.

I've been having trouble for a while with this and have been trying to get this on my own, but I can't. I'm trying to display the gaps using Bio::Graphics after retrieving my sequences from an alignment file.

I removed the exit portion. I was trying to format the code for this website and accidentally put the exit there. The input fill is a custalw file which is an alignment file with sequences on it. The sequences have "-" that represent gaps.

The command is ./aln.pl test-bioaln.aln > ll.png
My expected output would be a .png file with sequences that are aligned and show gaps as a different color.

The input file is:

CLUSTAL W (1.81) multiple sequence alignment

JD1:1:102:1601:ORFJ00027 ------------------------------atgtataaacaacaatattttatttct--c 94a:1:107:117:orf00001 ------------------------------atgtataaacaacaatattttatttct-ac 118a:1:106:158122218:orf00020 ------------------------------atgtataaacaacaatattttatttct-gc B31:1:100:4091:ORFB0018 ------------------------------atgtataaacaacaatattttatttctggc 72a:1:105:32:orf00022 ------------------------------atgtataaacaacaatattttatttctggc 64b:1:110:473:orf00001 ------------------------------atgtataaacaacaatattttatttctggc 29805:1:108:171:orf00001 ------------------------------atgtataaacaacaatattttatttctggc BOL26:1:111:60:orf00001 ------------------------------atgtataaacaacaatattttatttctggc CA-11.2A:1:109:33:orf00001 ------------------------------atgtataaacaacaatattttatttctggc WI91-23:1:112:493:orf00001 ------------------------------atgtataaacaacaatattttatttctggc 297:1:103:411:ORFB00012 ttggatagattttatacaaagaaggtaataatgtataaacaacaatattttatttctggc N40:1:101:1716:ORFK00021 ------------------------------atgtataaacaacaatattttatttctggc ZS7:1:113:22:orf00001 ------------------------------atgtataaacaacaatattttatttctggc ******************************

JD1:1:102:1601:ORFJ00027 aaggtgcaaggtgttggttttagatttttcacagagcaaatagcaaataatatgaaacta 94a:1:107:117:orf00001 aaggtacaaggtgttggttttagattttttacagagcaaatagcaaataatatgaaacta 118a:1:106:158122218:orf00020 aaggtacaaggtgttggttttagattttttacagagcaaatagcaaataatatgaaacta B31:1:100:4091:ORFB0018 aaggtgcaaggtgttggttttagatttttcacagagcaaatagcaaataatatgaaacta 72a:1:105:32:orf00022 aaggtacaaggtgttggttttagattttttacagagcaaatagcaaataatatgaaacta 64b:1:110:473:orf00001 aaggtgcaaggtgttggttttagatttttcacagagcaaatagcaaataatatgaaacta 29805:1:108:171:orf00001 aaggtgcaaggtgttggttttagatttttcacagagcaaatagcaaataatatgaaacta BOL26:1:111:60:orf00001 aaggtacaaggtgttggttttagattttttacagagcaaatagcaaataatatgaaacta CA-11.2A:1:109:33:orf00001 aaggtacaaggtgttggttttagattttttacagagcaaatagcaaataatatgaaacta WI91-23:1:112:493:orf00001 aaggtgcaaggtgttggttttagatttttcacagagcaaatagcaaataatatgaaacta 297:1:103:411:ORFB00012 aaggtacaaggtgttggttttagattttttacagagcaaatagcaaataatatgaaacta N40:1:101:1716:ORFK00021 aaggtgcaaggtgttggttttagatttttcacagagcaaatagcaaataatatgaaacta ZS7:1:113:22:orf00001 aaggtacaaggtgttggttttagattttttacagagcaaatagcaaataatatgaaacta ***** *********************** ******************************

JD1:1:102:1601:ORFJ00027 aaaggatttgtaaaaaatctcaacgatggaagggtagaaattgtagctttctttaatact 94a:1:107:117:orf00001 aaaggatttgtaaaaaatctaaacgatggaagggtagaaattgtagctttctttaatact 118a:1:106:158122218:orf00020 aaaggatttgtaaaaaatctaaacgatggaagggtagaaattgtagctttctttaatact B31:1:100:4091:ORFB0018 aaaggatttgtaaaaaatctcaacgatggaagggtagaaattgtagctttctttaatact 72a:1:105:32:orf00022 aaaggatttgtaaaaaatctaaacgatggaagggtagaaattgtagctttctttaatact 64b:1:110:473:orf00001 aaaggatttgtaaaaaatctcaacgatggaagggtagaaattgtagctttctttaatact 29805:1:108:171:orf00001 aaaggatttgtaaaaaatctcaacgatggaagggtagaaattgtagctttctttaatact BOL26:1:111:60:orf00001 aaaggatttgtaaaaaatctcaacgatggaagggtagaaattgtagctttctttaatact CA-11.2A:1:109:33:orf00001 aaaggatttgtaaaaaatctaaacgatggaagggtagaaattgtagctttctttaatact WI91-23:1:112:493:orf00001 aaaggatttgtaaaaaatctcaacgatggaagggtagaaattgtagctttctttaatact 297:1:103:411:ORFB00012 aaaggatttgtaaaaaatctaaacgatggaagggtagaaattgtagctttctttaatact N40:1:101:1716:ORFK00021 aaaggatttgtaaaaaatctcaacgatggaagggtagaaattgtagctttctttaatact ZS7:1:113:22:orf00001 aaaggatttgtaaaaaatctcaacgatggaagggtagaaattgtagctttctttaatact ******************** ***************************************

JD1:1:102:1601:ORFJ00027 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattaaa 94a:1:107:117:orf00001 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattgaa 118a:1:106:158122218:orf00020 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattgaa B31:1:100:4091:ORFB0018 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattaaa 72a:1:105:32:orf00022 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattgaa 64b:1:110:473:orf00001 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattaaa 29805:1:108:171:orf00001 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattgaa BOL26:1:111:60:orf00001 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattaaa CA-11.2A:1:109:33:orf00001 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattgaa WI91-23:1:112:493:orf00001 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattgaa 297:1:103:411:ORFB00012 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattgaa N40:1:101:1716:ORFK00021 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattgaa ZS7:1:113:22:orf00001 aaagaacaaatgaaaaaatttgaaaaattattaaatgggaataagtattcaaacattaaa ********************************************************* **

JD1:1:102:1601:ORFJ00027 aacattgaaaaaatagttttagatgaaaattatccttttcaatttaatgattttaaaatt 94a:1:107:117:orf00001 aacattgaaaaaatagttttagatgaaaattatccttttcaatttaatgattttaaaatt 118a:1:106:158122218:orf00020 aacattgaaaaaatagttttagatgaaaattatccttttcaatttaatgattttaaaatt B31:1:100:4091:ORFB0018 aacattgaaaaaatagttttagatgaaaattatccttttcaatttaatgattttaaaatt 72a:1:105:32:orf00022 aacattgaaaaaatagttttagatgaaaattatccttttcaatttaatgattttaaaatt 64b:1:110:473:orf00001 aacattgaaaaaatagttttagatgaaaattatccttttcaatttaatgattttaaaatt 29805:1:108:171:orf00001 aacattgaaaaaatagctttagatgaaaattatccttttcaatttaatgattttaaaatt BOL26:1:111:60:orf00001 aacattgaaaaaatagttttagatgaaaattatccttttcaatttaatgattttaaaatt CA-11.2A:1:109:33:orf00001 aacattgaaaaaatagttttagatgaaaattatccttttcaatttaatgattttaaaatt WI91-23:1:112:493:orf00001 aacattgaaaaaatagctttagatgaaaattatccttttcaatttaatgattttaaaatt 297:1:103:411:ORFB00012 aacattgaaaaaatagttttagatgaaaattatccttttcaatttaatgattttaaaatt N40:1:101:1716:ORFK00021 aacattgaaaaaatagctttagatgaaaattatccttttcaatttaatgattttaaaatt ZS7:1:113:22:orf00001 aacattgaaaaaatagttttagatgaaaattatccttttcaatttaatgattttaaaatt **************** *******************************************

JD1:1:102:1601:ORFJ00027 tattat 94a:1:107:117:orf00001 tattat 118a:1:106:158122218:orf00020 tattat B31:1:100:4091:ORFB0018 tattat 72a:1:105:32:orf00022 tattat 64b:1:110:473:orf00001 tattat 29805:1:108:171:orf00001 tattat BOL26:1:111:60:orf00001 tattat CA-11.2A:1:109:33:orf00001 tattat WI91-23:1:112:493:orf00001 tattat 297:1:103:411:ORFB00012 tattat N40:1:101:1716:ORFK00021 tattat ZS7:1:113:22:orf00001 tattat ******

Here's the code:

#!/usr/bin/perl
use Bio::AlignIO;
use Bio::Graphics::Panel;

my $line = shift @ARGV; my $in = Bio::AlignIO->new(-file=>$line,-format=>"clustalw"); while($aln = $in->next_aln()){ foreach $seqobj($aln->each_seq()){ my $seq = $seqobj->seq; my $id = $seqobj->id; my $length = $seqobj->length; my $seqobj = Bio::SeqFeature::Generic->new(-start =>1, -end=>$length,-display_name=>$id); push (@seq, $seqobj); } foreach $seq(@seq){ my @features = $seq->get_SeqFeatures; my %sorted_features; for my $f (@features) { my $tag = "-"; push @{$sorted_features{$tag}},$f; }

    my <span class="math-container">$panel = Bio::Graphics::Panel-&gt;new(
        -length    =&gt; $</span>seq-&gt;length,
        -key_style =&gt; 'between',
        -width     =&gt; 800,
        -pad_left  =&gt; 10,
        -pad_right =&gt; 10,
    );
    <span class="math-container">$panel-&gt;add_track(generic =&gt; Bio::SeqFeature::Generic-&gt;new(-start=&gt;1,
            -end=&gt;$</span>seq-&gt;length),
        -glyph  =&gt; 'generic',
        -bgcolor =&gt; 'blue',
        -label  =&gt; 1,
    );

    my @colors = qw(cyan orange blue purple green chartreuse magenta yellow aqua);
    my <span class="math-container">$idx    = 0;
    for my $</span>tag (sort keys my %sorted_features) {
        my <span class="math-container">$features = $</span>sorted_features{<span class="math-container">$tag};
        $</span>panel-&gt;add_track(<span class="math-container">$features,
            -glyph    =&gt;  'generic',
            -bgcolor  =&gt;  $</span>colors[<span class="math-container">$idx++ % @colors],
            -fgcolor  =&gt; 'black',
            -font2color =&gt; 'red',
            -key      =&gt; "$</span>{tag}s&quot;,
            -bump     =&gt; +1,
            -height   =&gt; 8,
            -label    =&gt; 1,
            -description =&gt; 1,
        );
    }
    print $panel-&gt;png;

}

}

Unfortunately, $seq->get_SeqFeatures never returns anything. I'm not sure why it doesn't since there should be features on it coming from the Bio::SimpleAlign object. Maybe I should set some new parameters?

gringer
  • 14,012
  • 5
  • 23
  • 79

0 Answers0