I have metabarcoding sequence data (COI) from bulk animal samples (including arthropoda, nematoda, annelida, mollusca) and I want to BLAST all of these sequences. I used following command to do this: blastn -remote -db nt -query COI_all.fasta -num_alignments 2 -out COI_blasted.txt. However this results in errors similar to this post: https://www.biostars.org/p/359971/ .
These errors probably appear due to the number of sequences in my file (around 700) and the remote connection is thus interrupted.
I found that a solution would be to use blastn with a local database and since the samples are so diverse, I would like to download ALL animal COI sequences from BOLD (or gen bank). It would not be a problem if non-animal (e.g. plant) sequences would also be included.
I think the BOLD database would be great to BLAST my sequences to. However, I'm currently struggling to find a good way to download all animal COI sequences from BOLD.
When entering COI-5P as search term on http://v4.boldsystems.org/index.php/Public_SearchTerms I receive error: Your search terms resulted in too many matching terms. Please try again with more specific search criteria.. I could likely download the sequences from all the phyla etc seperately and merge them, but I'd rather just download 1 file.
I also tried to use the API by running: wget http://v4.boldsystems.org/index.php/API_Public/sequence?marker=COI-5P. A download starts but around 3.7 MB download, it is stuck and the file I receive only contains ~5000 sequences.
UPDATE: I've contacted BOLD about the stalling behavior and this is their reply: "This issue is because of the large API request that retrieves millions of records, which our system does not handle. Please break up the search by smaller groups, such as classes."
Does anyone have a solution to download all COI sequences from BOLD in one file?
I could also download COI sequences from gen bank using the ftp://ftp.ncbi.nlm.nih.gov/blast/db/ URL, but I'm not sure which exact files I need. For 16S, 18S,.. it is obvious, but not for COI. Any suggestions?
Thanks for the help.
Warning messages: 1: In .f(.x[[i]], ...) : the request timed out, see 'If a request times out' returning partial output 2: In .f(.x[[i]], ...) : the request timed out, see 'If a request times out' returning partial output 3: In .f(.x[[i]], ...) : the request timed out, see 'If a request times out' returning partial outputit downloaded 1012676 sequences, so quiet alot but I don't think that all are downloaded. Again, these warnings seem to be related to some kind of download limitation..
– Robvh May 06 '20 at 06:45esearchas final tool, as it is more to the point than the BOLD downloads. Thanks for your extensive help regarding the issue, for which I awarded the bounty. – Robvh May 06 '20 at 08:37bind_rows()command returns the following error.Error in bind_rows(): ! Argument 1 must be a data frame or a named atomic vector. Run rlang::last_trace() to see where the error occurred.Can you lend a helping hand after all these years?
– Nickmofoe Oct 28 '23 at 09:01