Highest Voted 'robots.txt' Questions - Webmasters Stack Exchange

31

votes

6 answers

If I don't want to set any special behavior, is it OK if I don't bother to have a robots.txt file?

If I don't want to set any special behavior, is it OK if I don't bother to have a robots.txt file? Or can the lack of one be harmful?

robots.txt

asked Jul 09 '10 at 08:25

Dan Dumitru

588
7
14

22

votes

3 answers

What is a minimum valid robots.txt file?

I don't like that I see a lot of 404 errors in the access.log of my web server. I'm getting those errors because crawlers try to open a robots.txt file, but couldn't find any. So I want to place a simple robots.txt file that will prevent the 404…

robots.txt

asked Jan 05 '14 at 01:01

bessarabov

343
2
7

12

votes

3 answers

Robots.txt: do I need to disallow a page which is not linked anywhere?

There are some pages on my website that I want the user to be able to visit only if I give him/her the URL. If I disallow the single pages in robots.txt, they will be visible by anybody looking into it. My question is: if I don't link them from…

robots.txt

asked Sep 04 '12 at 13:34

martjno

223
1
4

10

votes

2 answers

Allow a folder and disallow all sub folders in robots.txt

I would like to allow folder /news/ and disallow all the sub folders under /news/ e.g. /news/abc/, /news/123/. How can I do that please? I think Disallow: /news/ will block everything in it, including /news/ itself. Will Disallow: /news/*/ do the…

robots.txt

asked Jan 26 '18 at 03:12

Stickers

287
1
3
7

8

votes

3 answers

What's the proper way to handle Allow and Disallow in robots.txt?

I run a fairly large-scale Web crawler. We try very hard to operate the crawler within accepted community standards, and that includes respecting robots.txt. We get very few complaints about the crawler, but when we do the majority are about our…

robots.txt

asked Dec 01 '10 at 23:35

Jim Mischel

643
1
5
6

8

votes

1 answer

Does the line-ending format of robots.txt matter?

Simple question: Should I make sure to use Unix line endings for my robots.txt, or does it not matter?

robots.txt

asked Aug 30 '10 at 01:12

James Sulak

183
1
5

8

votes

2 answers

How do you disallow root in robots.txt, but allow a subdirectory?

Using robots.txt, how do you disallow the root of a site (http://www.example.com/) but allow a subdirectory (http://www.example.com/lessons/)?

robots.txt

asked Jul 29 '11 at 15:51

David Smith

281
3
6

7

votes

1 answer

What if robots.txt disallows itself?

User-agent: * Disallow: /robots.txt What happens if you do this? Will search engines crawl robots.txt once and then never crawl it again?

robots.txt

asked Aug 10 '18 at 02:47

clickbait

382
1
2
13

5

votes

2 answers

Is there any reason for putting humans.txt except of acknowledgement?

Are there any valid reasons of putting humans.txt? The only reason I see so far is to give credit to the team who created the site, and open source libraries it is using.

robots.txt

asked Apr 08 '14 at 07:53

Salvador Dali

359
4
16

5

votes

1 answer

Do we need to block repeated pages content for SEO relevance

I have multiple purchase pages with the same content like: product1red.php product2green.php Should I block them with robots.txt ?

robots.txt

asked Oct 17 '13 at 07:47

user32057

61
2

5

votes

2 answers

What is the correct way to write my "robots.txt" file?

I have written the following code inside my robots.txt file: User-Agent: Googlebot Disallow: User-agent: Mediapartners-Google Disallow: Sitemap: http://example.com/sitemap.xml Is my robots.txt is correct? I only want two user agent…

robots.txt

asked Jun 20 '13 at 04:17

ashutosh

316
1
3
14

5

votes

1 answer

wget not respecting my robots.txt. Is there an interceptor?

I have a website where I post csv files as a free service. Recently I have noticed that wget and libwww have been scraping pretty hard and I was wondering how to circumvent that even if only a little. I have implemented a robots.txt policy. I posted…

robots.txt

asked Jun 29 '11 at 17:55

Jane Wilkie

161
6

4

votes

2 answers

I don't want my site to be analyzed on WooRank or builtwith.com

I don't want my site to be analyzed on WooRank or builtwith.com. Is there any way I can do that by editing the robots.txt file or any other possible way?

robots.txt

asked Nov 07 '14 at 09:49

Krill

41
1

4

votes

3 answers

How to disallow robots from the first 185 pages?

I have a website that whereby the first 185 pages are sample profiles for demonstration purpose: http://example.com/profile/1 ... http://example.com/profile/185 I want to block these pages from Google as they are somewhat similar in content to…

robots.txt

asked Jul 14 '14 at 05:14

Question Overflow

1,598
5
18
24

4

votes

2 answers

robots.txt - just a guess about wild-card

so if I disallow tempPage, does it mean tempPage_1, temp_Page_2, tempPage_x are also disallowed? I tried to google this up, but I don't know...

robots.txt

asked Nov 19 '10 at 20:02

TPR

319
1
5

Questions tagged [robots.txt]