0

Pretty new to webscraping. I am running a for loop that scraps data about ships owned by Maresk.

I have the shipname and their IMO. With this information, I can generate url from where I can get the information.

For eg -

Ship Name - SEAGO ISTANBUL

IMO - 9313943

URl - https://www.vesselfinder.com/vessels/SEAGO-ISTANBUL-IMO-9313943

But some of the urls I have in the list do not exist. So when I try to retrieve it, I get 404 error. I can address this two way,

  1. I can check if the site exist before scraping and if it doesn't exist, skip the iteration and go to the next website.
  2. Try to scrape and if I get a 404 error, skip the iteration and move to next.

Now I only to want to skip if I get a 404 error and not any kind of error, I will have to debug if the error is of different sort.

for (i in 1:nrow(M)) {
  df <- M$site[i] %>% 
    read_html() %>% html_table()
}

In the above example, M is a data frame. And M$site is the column with the websites.

It'd be great if someone can help me out with this.

SivaR
  • 17
  • 5

0 Answers0