User-agent: *
Disallow: /robots.txt
What happens if you do this? Will search engines crawl robots.txt once and then never crawl it again?
User-agent: *
Disallow: /robots.txt
What happens if you do this? Will search engines crawl robots.txt once and then never crawl it again?
Robots.txt directives don't apply to robots.txt itself. Crawlers may fetch robots.txt even if it disallows itself.
It is actually very common for robots.txt to disallow itself. Many websites disallow everything:
User-Agent: *
Disallow: /
That drective to disallow everything would include robots.txt. I myself have some websites like this. Despite disallowing everything including robots.txt, search engine bots refresh the robots.txt file periodically.
Google's John Mueller recently confirmed that Googlebot still crawls a disallowed robots.txt: Disallowing Robots.txt In Robots.txt Doesn't Impact How Google Processes It. So even if you specifically called out Disallow: /robots.txt, Google (and I suspect other search engines) wouldn't change their behavior.
noindex HTTP headers, we would list court docs in our robots.txt file. The result was that those same cases could wind up in Google by way of robots.txt showing up in Google. In other words, robots.txt became a searchable index of everything we were trying to (lightly) hide.
– mlissner
Oct 26 '22 at 00:21