Pretty new to webscraping. I am running a for loop that scraps data about ships owned by Maresk.
I have the shipname and their IMO. With this information, I can generate url from where I can get the information.
For eg -
Ship Name - SEAGO ISTANBUL
IMO - 9313943
URl - https://www.vesselfinder.com/vessels/SEAGO-ISTANBUL-IMO-9313943
But some of the urls I have in the list do not exist. So when I try to retrieve it, I get 404 error. I can address this two way,
- I can check if the site exist before scraping and if it doesn't exist, skip the iteration and go to the next website.
- Try to scrape and if I get a 404 error, skip the iteration and move to next.
Now I only to want to skip if I get a 404 error and not any kind of error, I will have to debug if the error is of different sort.
for (i in 1:nrow(M)) {
df <- M$site[i] %>%
read_html() %>% html_table()
}
In the above example, M is a data frame. And M$site is the column with the websites.
It'd be great if someone can help me out with this.