What is a minimum valid robots.txt file?

Question

I don't like that I see a lot of 404 errors in the access.log of my web server. I'm getting those errors because crawlers try to open a robots.txt file, but couldn't find any. So I want to place a simple robots.txt file that will prevent the 404 errors from appearing in my log file.

What is a minimum valid robots.txt file that will allow everything on the site to be crawled?

dan · Accepted Answer · 2015-03-06T11:35:24.760

25

As indicated here, create a text file named robots.txt in the top-level directory of your web server. You can leave it empty, or add:

User-agent: *
Disallow:

If you want robots to crawl everything. If not, then see the above link for more examples.

edited Mar 06 '15 at 11:35

answered Jan 05 '14 at 01:28

dan

15,123
11
44
52

1

Why add "Disallow:" and not just "Allow: *"? – InanisAtheos Jan 05 '14 at 14:18
4

@Patrik "Allow" is for overriding any previous "Disallow" directives. It is meaningless if there is no "Disallow". Honestly the best solution is a blank file. – DisgruntledGoat Jan 05 '14 at 19:23
2

Ah, I see. I also agree that a blank file is the best. – InanisAtheos Jan 05 '14 at 21:40
2

@PatrikAlienus Because "Allow" is not in robots.txt specification. – user11153 Mar 06 '15 at 12:37
1

@user11153: Huh? What about section "3.2.2 The Allow and Disallow lines" of the 1997 Internet Draft specification A Method for Web Robots Control? – David Cary Mar 08 '15 at 12:25

Maximillian Laumeister · Answer 2 · 2019-01-27T01:18:58.393

7

The best minimal robots.txt is a completely empty file.

Any other "null" directives such as an empty Disallow or Allow: * are not only useless because they are no-ops, but add unneeded complexity.

If you don't want the file to be completely empty - or you want to make it more human-readable - simply add a comment beginning with the # character, such as # blank file allows all. Crawlers ignore lines starting with #.

edited Jan 27 '19 at 01:18

answered Jan 24 '19 at 17:30

Maximillian Laumeister

15,972
3
31
62

score -1 · Answer 3 · answered Jan 25 '19 at 07:17

-1

I would say this;

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

It will allow Google to crawl everything but will disallow Google to Crawl your aadminn panel. Which is an ideal situation for you.

answered Jan 25 '19 at 07:17

Fahad Ur Rehman Khan

287
1
8

3

I may be missing something, but I don't think the asker said they are using Wordpress. – Maximillian Laumeister Jan 27 '19 at 01:18

What is a minimum valid robots.txt file?

3 Answers3

Linked