From what I understand after reading Google's Controlling Crawling and Indexing:
- The purpose of robots.txt file is to disallow crawling of some URLs, but those can still be indexed (and appear in search results) if they are linked from crawlable pages
- To prevent a page from being indexed I need to make it crawlable and add the noindex meta tag in its head
So, why would I setup a robots.txt file if, in the end, it has no impact on whether pages appear in search results or not?
Disallowdirective in robots.txt. To prevent indexation, you'd have to use<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">in the<head>of the necessary pages. – zigojacko Mar 11 '14 at 08:30