GoogleBot constantly crawls non-existant PDFs with 'viagra' or 'cialis' in the name?

Question

My access log is full of requests for non-existent pdfs relating to 'viagra' and 'cialis' or other similar drugs from GoogleBot (user-agent is: Googlebot/2.1 (+http://www.google.com/bot.html) ip range is 66.249.64.*)

examples:

/WdjUZ/LXWKZ/cialis-hearing-loss.pdf
/WKYnZ/viagra-questions.pdf
/ZWohZ/LhfkZ/canadian-viagra-and-healthcare.pdf
/YSXaZ/XnoZZ/buy-propecia-no-prescription.pdf
/MRWQZ/MeWXZ/TZlWZ/UlaMZ/drug-manufacturers-buy-softtabs-viagra.pdf
/PnddZ/NKdZZ/generic-viagra-no-prescription-australia.pdf
/QQWVZ/RoRbZ/URObZ/LdNgZ/levitra-10-mg-order.pdf

Why would google think these are urls that need to be crawled? If someone else is hosting a page with links like this, what purpose would that serve?

Is there any harm in creating a robots.txt rule to tell google to stop like so:

User-agent *
Disallow /*viagra*.pdf$
Disallow /*cialis*.pdf$
Disallow /*propecia*.pdf$
Disallow /*levitra*.pdf$

When you say they are "non-existent," does that mean they return a 404 status? Such requests mean that your site is or had been hacked at some point. It is very important to make sure that Googlebot isn't seeing any content on those URLs. You should verify this by looking at the status in your logs and by using the "inspect live URL" feature in Google Search Console for some of the URLs. Hackers often only show the content to Googlebot, so just checking the URL in your browser is not a good enough way to check. Make sure the hack is cleaned up before worrying about stopping crawling. — Stephen Ostermiller, Jun 15 '20 at 10:31

GoogleBot constantly crawls non-existant PDFs with 'viagra' or 'cialis' in the name?

0 Answers0