How can I resolve `bind_rows()` error?

Question

So I wanted to download massive amounts of COX1 (or COI) sequences from any available database. I found this script in https://bioinformatics.stackexchange.com/a/13187/16657 this comment that achieves just that using the bold R package. I changed the script to fit my data preference:

# load packages
library(tidyverse)
library(rentrez)
library(bold)    # API interface to BOLD
library(taxize)  # for NCBI taxonomy lookup
library(seqinr)  # for FASTA output
set_entrez_key()
get class-level taxa within "Mollusca" from NCBI taxonomy
taxa <- downstream("Mollusca", db = "ncbi", downto = "order")
check if taxa present in BOLD
checks <- bold_tax_name(taxa$Mollusca$childtaxa_name)
taxa_bold <- checks[!is.na(checks$taxon),]$taxon
Download sequences from BOLD for each class-level taxon
sequences <- map(taxa_bold, bold_seq, marker = 'COI-5P') %>%
  flatten() %>%
  bind_rows()
Convert the list of sequences to a data frame
sequences_df <- do.call(rbind, sequences)
Write sequences to a file
write.fasta(
  sequences = as.list(sequences_df$sequence), 
  names = as.list(sequences_df$id), 
  nbchar = 80, 
  file.out = 'coi5p.fasta'
)

In the aforementioned thread, this script worked but now that I ran it I got the following error:

Error in `bind_rows()`:
! Argument 1 must be a data frame or a named atomic vector.
Run `rlang::last_trace()` to see where the error occurred.

I am not sure what to do for this. I commented on the original thread but since it is kind of old I decided to post about it here.

Note: the set_entrez_key() requires a NCBI API key which you can get by registering on NCBI.

Edit #1: The sequences object before the flatten() command is a large list containing 53 elements. After the command, the object becomes a large list with 257 elements.

Can you please show the object as it appears before and after the 'flatten' command? It's unusual for me to see that command in R code, and this is just before the bind_rows command that you're concerned about, so it will probably be important for working out what's going wrong. — gringer, Oct 29 '23 at 07:23
@gringer I edited the question with the details about the object. — Nickmofoe, Oct 29 '23 at 07:55
Please show the head of the object, or the first few lines of output. Your text description is not detailed enough for me to understand what's going on. — gringer, Oct 29 '23 at 11:30

score 1 · Accepted Answer · answered Oct 29 '23 at 15:49

I managed to find the solution. After @gringer mentioned that they had not seen the flatten command I decided to dig deeper. Turns out the command has been superseded in purrr 1.0.0. So I will be posting the new script for anyone who seeks an updated version of the one in the old thread.

# load packages
library(tidyverse)
library(rentrez)
library(bold)    # API interface to BOLD
library(taxize)  # for NCBI taxonomy lookup
library(seqinr)  # for FASTA output
library(rBLAST)
set_entrez_key("d00fee673c4499df4c2479ba5fd71850b308") #my ncbi api key, this is essential for pulling the taxonomy through ncbi
get class-level taxa within "Mollusca" from NCBI taxonomy
taxa <- downstream("Mollusca", db = "ncbi", downto = "class") #setting the taxonomic level of the search. it will pull sequences for all mollusca but seperately for each class. if the search is large change to a lower taxonomic level
check if taxa present in BOLD
checks <- bold_tax_name(taxa$Mollusca$childtaxa_name) 
taxa_bold <- checks[!is.na(checks$taxon),]$taxon
Download sequences from BOLD for each class-level taxon
sequences <- map(taxa_bold, bold_seq, marker = 'COI-5P')
Convert the list of sequences to a data frame
sequences_df <- do.call(rbind, sequences)
Filter sequences based on the "COI-5P" marker. Despite the markes parameter, some files may have integrated multiple loci. The other sequences are removed from the R object
coi5p_sequences <- sequences_df %>% 
  filter(marker == "COI-5P")
Write sequences to a file
write.fasta(
  sequences = as.list(coi5p_sequences$sequence), 
  names = as.list(coi5p_sequences$id), 
  nbchar = 60, 
  file.out = 'coi5p_Mollusca.fasta'
)

How can I resolve `bind_rows()` error?

get class-level taxa within "Mollusca" from NCBI taxonomy

check if taxa present in BOLD

Download sequences from BOLD for each class-level taxon

Convert the list of sequences to a data frame

Write sequences to a file

1 Answers1

get class-level taxa within "Mollusca" from NCBI taxonomy

check if taxa present in BOLD

Download sequences from BOLD for each class-level taxon

Convert the list of sequences to a data frame

Filter sequences based on the "COI-5P" marker. Despite the markes parameter, some files may have integrated multiple loci. The other sequences are removed from the R object

Write sequences to a file