10

I would like to allow folder /news/ and disallow all the sub folders under /news/ e.g. /news/abc/, /news/123/. How can I do that please?

I think Disallow: /news/ will block everything in it, including /news/ itself.

Will Disallow: /news/*/ do the job, since there is no easy way to test it, I want to make sure.

Stickers
  • 287
  • 1
  • 3
  • 7

2 Answers2

12
User-agent: *
Allow: /news/$
Disallow: /news/

Explanation:

Google's robots.txt spec (https://developers.google.com/search/reference/robots_txt), which is more up to date than the "official" spec, states that:

/fish/ will match anything in the /fish/ folder but will not match /fish (and, no wildcard necessary, since "The trailing slash means this matches anything in this folder.") If you kinda reverse engineer that:

User-agent: * (or whatever user agent you want to talk to)
Allow: /news/$ (allows /news/ but the $ character says the allow can't go beyond /news/)
Disallow: /news/ (disallows anything in the /news/ folder)

Test it in Google Search Console, or in Yandex (https://webmaster.yandex.com/tools/robotstxt/) to ensure it works for your site.

DocRoot
  • 4,297
  • 12
  • 20
Henry Visotski
  • 5,216
  • 9
  • 18
  • I tried this at the root level to allow all webpages to be crawled but to block all directories i.e.: User-agent: * Allow: /$ Disallow: /

    And tested it via the google search console but https://www.mywebsite.com/index.php was blocked. Does this need to be different at root level?

    – rbassett Jul 12 '21 at 15:12
  • 1
    I have answered my own question: User-agent: * Disallow: /*/ – rbassett Jul 12 '21 at 15:27
  • 1
    Yep. You can even do just: Disallow: /* (because the final slash comes after the wildcard anyway, which covers the slash). – Henry Visotski Jul 13 '21 at 16:45
1

I had sort of the same issue. This:

User-agent: *
Allow: /folder/$
Disallow: /folder/

Did not work for me - the url/folder WOULD appear in the google search results - but it would just say NO INFO or something - nothing from the html page would be indexed. So I tried:

User-agent: *
Allow: /folder/index.html
Disallow: /folder/*

Same thing.

What DID work was to put this meta tag in the index.html file in /folder/:

<META NAME="ROBOTS" CONTENT="INDEX, NOFOLLOW">

With this tag - no links were followed to any other pages or folders but the contents of index.html DID appear.

Limbomusic
  • 11
  • 1