My access log is full of requests for non-existent pdfs relating to 'viagra' and 'cialis' or other similar drugs from GoogleBot (user-agent is: Googlebot/2.1 (+http://www.google.com/bot.html) ip range is 66.249.64.*)
examples:
/WdjUZ/LXWKZ/cialis-hearing-loss.pdf
/WKYnZ/viagra-questions.pdf
/ZWohZ/LhfkZ/canadian-viagra-and-healthcare.pdf
/YSXaZ/XnoZZ/buy-propecia-no-prescription.pdf
/MRWQZ/MeWXZ/TZlWZ/UlaMZ/drug-manufacturers-buy-softtabs-viagra.pdf
/PnddZ/NKdZZ/generic-viagra-no-prescription-australia.pdf
/QQWVZ/RoRbZ/URObZ/LdNgZ/levitra-10-mg-order.pdf
Why would google think these are urls that need to be crawled? If someone else is hosting a page with links like this, what purpose would that serve?
Is there any harm in creating a robots.txt rule to tell google to stop like so:
User-agent *
Disallow /*viagra*.pdf$
Disallow /*cialis*.pdf$
Disallow /*propecia*.pdf$
Disallow /*levitra*.pdf$