Google CSE has indexed robots.txt and now if someone searches for 'txt' it returns the robots.txt file which is really not ideal (as this is a bog-standard Drupal site, the string robots.txt actually appears in the text). How can I avoid this? Is there a setting somewhere in Google or should I add /robots.txt to erm, robots.txt or...?
1 Answers
You could add this to robots.txt:
Disallow: /robots.txt
In What if robots.txt disallows itself? Google's John Mueller says:
The only thing this would affect is if a link were pointing to the robots.txt and Google would otherwise index the content of the robots.txt file. That wouldn't be possible when it's disallowed by robots.txt.
So it seems that adding a disallow rule in robots.txt for robots.txt itself can help prevent robots.txt from getting indexed without preventing Googlobot from fetching the file to see what else is disallowed.
Another way to handle it would be to add a HTTP header to robots.txt that prevents indexing. This would be a similar solution to the problem Prevent XML sitemaps from showing up in Google search results. You would want the following HTTP header served for robots.txt:
X-Robots-Tag: noindex
Under Apache you would implement it with this .htaccess code:
<Files ~ "robots\.txt$">
Header append X-Robots-Tag "noindex"
</Files>
- 98,758
- 18
- 137
- 361
Headeris not always available, it's frommod_headerswhich for example Ubuntu doesn't load by default. – chx Sep 06 '18 at 13:02sudo a2enmod mod_headers– Stephen Ostermiller Sep 06 '18 at 13:24