Robots.txt is text file used by Website owners to give instructions about their site to web robots. Basically it tells robots which parts of the site are open and which parts are closed. This is called The Robots Exclusion Protocol.
Questions tagged [robots.txt]
743 questions
31
votes
6 answers
If I don't want to set any special behavior, is it OK if I don't bother to have a robots.txt file?
If I don't want to set any special behavior, is it OK if I don't bother to have a robots.txt file?
Or can the lack of one be harmful?
Dan Dumitru
- 588
- 7
- 14
22
votes
3 answers
What is a minimum valid robots.txt file?
I don't like that I see a lot of 404 errors in the access.log of my web server. I'm getting those errors because crawlers try to open a robots.txt file, but couldn't find any. So I want to place a simple robots.txt file that will prevent the 404…
bessarabov
- 343
- 2
- 7
12
votes
3 answers
Robots.txt: do I need to disallow a page which is not linked anywhere?
There are some pages on my website that I want the user to be able to visit only if I give him/her the URL.
If I disallow the single pages in robots.txt, they will be visible by anybody looking into it.
My question is: if I don't link them from…
martjno
- 223
- 1
- 4
10
votes
2 answers
Allow a folder and disallow all sub folders in robots.txt
I would like to allow folder /news/ and disallow all the sub folders under /news/ e.g. /news/abc/, /news/123/. How can I do that please?
I think Disallow: /news/ will block everything in it, including /news/ itself.
Will Disallow: /news/*/ do the…
Stickers
- 287
- 1
- 3
- 7
8
votes
3 answers
What's the proper way to handle Allow and Disallow in robots.txt?
I run a fairly large-scale Web crawler. We try very hard to operate the crawler within accepted community standards, and that includes respecting robots.txt. We get very few complaints about the crawler, but when we do the majority are about our…
Jim Mischel
- 643
- 1
- 5
- 6
8
votes
1 answer
Does the line-ending format of robots.txt matter?
Simple question: Should I make sure to use Unix line endings for my robots.txt, or does it not matter?
James Sulak
- 183
- 1
- 5
8
votes
2 answers
How do you disallow root in robots.txt, but allow a subdirectory?
Using robots.txt, how do you disallow the root of a site (http://www.example.com/) but allow a subdirectory (http://www.example.com/lessons/)?
David Smith
- 281
- 3
- 6
7
votes
1 answer
What if robots.txt disallows itself?
User-agent: *
Disallow: /robots.txt
What happens if you do this? Will search engines crawl robots.txt once and then never crawl it again?
clickbait
- 382
- 1
- 2
- 13
5
votes
2 answers
Is there any reason for putting humans.txt except of acknowledgement?
Are there any valid reasons of putting humans.txt? The only reason I see so far is to give credit to the team who created the site, and open source libraries it is using.
Salvador Dali
- 359
- 4
- 16
5
votes
1 answer
Do we need to block repeated pages content for SEO relevance
I have multiple purchase pages with the same content like:
product1red.php
product2green.php
Should I block them with robots.txt ?
user32057
- 61
- 2
5
votes
2 answers
What is the correct way to write my "robots.txt" file?
I have written the following code inside my robots.txt file:
User-Agent: Googlebot
Disallow:
User-agent: Mediapartners-Google
Disallow:
Sitemap: http://example.com/sitemap.xml
Is my robots.txt is correct? I only want two user agent…
ashutosh
- 316
- 1
- 3
- 14
5
votes
1 answer
wget not respecting my robots.txt. Is there an interceptor?
I have a website where I post csv files as a free service. Recently I have noticed that wget and libwww have been scraping pretty hard and I was wondering how to circumvent that even if only a little.
I have implemented a robots.txt policy. I posted…
Jane Wilkie
- 161
- 6
4
votes
2 answers
I don't want my site to be analyzed on WooRank or builtwith.com
I don't want my site to be analyzed on WooRank or builtwith.com.
Is there any way I can do that by editing the robots.txt file or any other possible way?
Krill
- 41
- 1
4
votes
3 answers
How to disallow robots from the first 185 pages?
I have a website that whereby the first 185 pages are sample profiles for demonstration purpose:
http://example.com/profile/1
...
http://example.com/profile/185
I want to block these pages from Google as they are somewhat similar in content to…
Question Overflow
- 1,598
- 5
- 18
- 24
4
votes
2 answers
robots.txt - just a guess about wild-card
so if I disallow tempPage, does it mean tempPage_1, temp_Page_2, tempPage_x are also disallowed? I tried to google this up, but I don't know...
TPR
- 319
- 1
- 5