I have run a blastx of metagenomic databases (raw illumina reads) using the nr database. Unfortunately, I forgot to add the --outfmt 6 argument to the code and got the traditional output.
Could I parse this output into --outfmt 6?
I have run a blastx of metagenomic databases (raw illumina reads) using the nr database. Unfortunately, I forgot to add the --outfmt 6 argument to the code and got the traditional output.
Could I parse this output into --outfmt 6?
The blast_formatter command allows you to produce output in any of the formats supported by BLAST. However, it requires that you run blastx with --outfmt 11 (ASN.1 format). I'm not sure there are any tools that will do the conversion from the full default BLAST report. This could potentially be scripted, but would be complex enough that you're probably better off re-running BLAST.
Biopython has a parser for Blast's default text format (which I assume is the one you have). Just use the blast-text string to specify this format in SearchIO's parse method.
from Bio import SearchIO
results = SearchIO.parse('myfile.txt', 'blast-text')
for entry in results:
# extract the information you want
query = entry.id
for hit in entry.hits:
subject = hit.id
for hsp in hit.hsps:
identity = hsp.ident_num *100 / entry.seq_len
aln_len = hsp.aln.get_alignment_length()
mismatches = aln_len - hsp.ident_num # not sure about this
gaps = hsp.gap_num
q_start = hsp.query_start + 1
q_end = hsp.query_end
s_start = hsp.hit_start + 1
s_end = hsp.hit_end
e_value = hsp.evalue
bit_score = int(hsp.bitscore)
# print in outfmt 6 format
print('\t'.join([str(x) for x in [query,
subject,
identity,
aln_len,
mismatches,
gaps,
q_start,
q_end,
s_start,
s_end,
e_value,
bit_score]]))
Have a look at Biopython's docs for more information on how to parse Blast results.
--outfmt 6flag unless you don't have the computational resources. If it's too slow you can use DIAMOND as a replacement for BLAST with very similar commands – Chris_Rands Sep 07 '17 at 09:53