4

I removed property (website) from Google Search Console. The purpose was to remove the website from the search engine results. Will it do that?

Stephen Ostermiller
  • 98,758
  • 18
  • 137
  • 361

2 Answers2

2

Removing your website from the search console will not remove your website from their indexes. Once your site is indexed, it will remain indexed until no longer existing.

Temporary removal

If you wish to remove your website from their indexes, use the URL removal tool. But, this is only temporary.

Very important notes:

  • A successful request lasts only about 90 days. After that, your information can appear on Google search results (see Making removal permanent).
  • Clearing the cache or hiding a URL does not change Googlebot's crawl schedule or page caching behavior. When you request a temporary block of a URL, Google will still continue to crawl your URL, if it exists and isn't blocked by another method (such as a noindex tag). Because of this, it is possible that your page can be crawled and cached again before you remove or password-protect your page, and can appear in search results after your temporary blackout expires.
  • If your URL becomes unreachable by Googlebot, it will assume that the page is gone and your block request will be ended. Any page found at that URL later will be considered a new page that can appear in Google Search results.

Making removal permanent

The URL removal tool provides only a temporary removal. To remove content or a URL from Google search permanently you must take one or more of the following additional actions:

  • Remove or update the actual content from your site (images, pages, directories) and make sure that your web server returns either a 404 (Not Found) or 410 (Gone) HTTP status code. Non-HTML files (like PDFs) should be completely removed from your server.
  • Block access to the content, for example by requiring a password.
  • Indicate that the page should not be indexed using the noindex meta tag.
Chris Rogers
  • 457
  • 3
  • 14
1

Removing a property from Google Console only removes the website from Google Console.

I am not sure of exactly what your goal is, however, you can use robots.txt to remove your website from Google, for example, using...

User-agent: Googlebot
Disallow: /

...or all search engines using

User-agent: *
Disallow: /

Each search engine has it's own bot name, for example, Bing is bingbot.

User-agent: bingbot
Disallow: /

Robots.txt is a simple text file in the root of your website. It should be available as example.com/robots.txt or www.example.com/robots.txt.

You can read about robots.txt at robots.org

A list of the larger search engine bot/spider names can be found at top search engine bot names.

Using the robots.txt file and the proper bot name is generally the fastest way to remove a website from a search engine. Once the search engine reads the robots.txt file, the website will be removed within about 2 or so days unless things have changed recently. Google used to drop sites within 1-2 days. Each search engine is different and the responsiveness of each can vary. Please know that the larger search engines are fairly responsive.

To address the comments.

Robots.txt is indeed used by search engines to know what pages to index. This is well known and understood and has been a de facto standard since 1994.

How Google works.

Google indexes links, domains, URLs and page content among other data.

The links table is used to discover new sites and pages and for ranking pages using the PageRank algorithm which is based upon the trust networks model.

The URL table is used as a join table between links and pages.

If you know SQL Database Schema,

The link table would be something like: linkID linkText linkSourceUrlID linkTargetUrlID

The domain table would be something like: domainID urlID domainAGE domainIP domainRegistrar domainRegistrantName ...

The URL table would be something like: urlID urlURL

The pages table would be something like: pageID urlID pageTitle pageDescription pageHTML

The url table is a join table between domains, links, and pages.

The page index is used to understand the content of and index individual pages. Indexing is far more complicated than just a SQL table, however, the illustration still stands.

When Google follows a link, the link is put into the link table. If the URL is not in the URL table, it is added to the URL table and submitted to the fetch queue.

When Google fetches the page, Google looks to see if the robots.txt file has been read and if so, if it was read within 24 hours. If the cached robots.txt data is older than 24 hours old, Google fetches the robots.txt file again. If a page is restricted by robots.txt, Google will not index the page or remove the page from the index if it already exists.

When Google sees a restriction in robots.txt it is submitted to a queue for processing. The processing begins nightly as a batch style process. The pattern is matched to all URLs and all pages are dropped from the page table using the URL id. The URL is retained for housekeeping.

Once the page has been fetched, the page is put into the page table.

Any link within the link table that has not been fetched, or is restricted by robots.txt, or a broken link with a 4xx error, these are known as dangling links. And while PR can be calculated using the trust networks theory for the target pages of dangling links, PR cannot be passed through these pages.

About 6 years ago or so, Google felt it was wise to include dangling links into the SERPs. This was done when Google redesigned it's index and systems to aggressively capture the entire web. The thought behind this was to present valid search results to users even if the page is restricted from the search engine.

URLs have very little if any semantic value.

Links do have some semantic value, however, this value remains little as semantic indexing prefers more text and cannot perform well as a standalone element. Ordinarily, semantic value of a link is measured along with the semantic value of the source page (the page with the link) and the semantic value of the target page.

As a result, any URL to a target page of a dangling link cannot rank at all well. The exception is for newly discovered links and pages. As policy, Google likes to "taste" newly discovered links and pages within the SERPs by defaulting the PR values high enough to be found and tested within the SERPs. Over time, PR and CTR is measured and adjusted to place links and pages where they should exist.

See ROBOTS.TXT DISALLOW: 20 Years of Mistakes To Avoid where ranking as I have described is also discussed.

Listing links in the SERPs is misguided and many complained about it. It pollutes the SERPs with broken links and links behind logins or paywalls, for example. Google has not changed this practice, however, the ranking mechanisms do filter out the links from the SERPs effectively removing them from the SERPs entirely.

Remember that the indexing engine and query engine are two different things.

Google does recommend using noindex for pages which is not always possible or practical. I use noindex, however, for very large websites using automation, this may be impossible or at least cumbersome.

I had a website with millions of pages I removed from Googles index by using the robots.txt file within days.

And while Google argues against using the robots.txt file and using noindex instead, this is a much slower process. Why? Because Google uses a TTL style metric in it's index that determines how often Google visits that page. This can be a long period including up to a year or more.

Using noindex does not remove the URL from the SERPs the same way that robots.txt does not. The end result remains the same. Noindex, as it turns out, is not, in reality, actually better than using the robots.txt file. Both produce the same effect while the robots.txt file renders results faster and in bulk.

And this is, in part, the point of the robots.txt file. It is widely accepted that people will block whole sections of their website using robots.txt or block bots from the site entirely. This is a more common practice than adding noindex to pages.

Removing an entire site using the robots.txt file is still the quickest way even if Google does not like it. Google is not God nor is it's website the New New Testament. As hard as Google tries, it still does not rule the world. Damn near, but not quite yet.

The claim that blocking a search engine using robots.txt in effect blocks the search engine from seeing a noindex meta tag is utter nonsense and defies logic. You see this argument everywhere. Both mechanisms, in effect, are exactly the same except one is much much faster as a result of bulk processing.

Keep in mind that the robots.txt standard was adopted in 1994 while in 1996 the noindex meta tag had yet to be adopted even by Google in 1997. In the early days, to remove a page from a search engine meant using the robots.txt file and remained so for quite a while. Noindex is just an add-on to the already existing process.

Robots.txt remains the number 1 mechanism for restricting what a search engine indexes and likely will for as long as I will be alive. (I better be careful crossing the street. No more skydiving for me!)

closetnoc
  • 32,849
  • 4
  • 45
  • 69
  • @MaximillianLaumeister Google will indeed drop the pages from it's index. There is no confusion on this. However, you are right in a way. Any link to the site will be reflected for a period in the SERPs but these generally go away within about 2 months. Google has an odd theory of including a page in the SERPs by only discovering a link. Many of us have argued that this pollutes the SERPs and misguided. Knowing how Google works, when Google finds a link to an unknown site, it creates a link in their index before visiting the site and shows that link in the SERPs. Google used to do it right. – closetnoc Sep 13 '19 at 15:19
  • @MaximillianLaumeister I explained the process to make clear that using robots.txt indeed works as I suggested. – closetnoc Sep 13 '19 at 16:42
  • 1
    I'm sorry, but your answer is not correct. The article you linked (plus the article I linked) both contradict your answer. From your answer: "The claim that blocking a search engine using robots.txt in effect blocks the search engine from seeing a noindex meta tag is utter nonsense." But from your article: "Disallowing a URL via robots.txt will not prevent it from being seen by searchers in search results pages. [...] To prevent URLs from appearing in Google search results, URLs must be crawlable and not disallowed with robots.txt.". – Maximillian Laumeister Sep 13 '19 at 16:47
  • @MaximillianLaumeister Like is say, the claim often made is utter nonsense. Using robots.txt works just fine and is indeed faster and has been since the dawn of search. Internally, as I described it, the process for noindex and robots.txt is exactly the same except for the pattern match batch process which removes all pages matching the pattern at once versus removing them one at a time via the TTL style metric which can take more than a year. I have used this method since the very earliest days of search. Google is just plain wrong-headed on showing links in SERPs. It is noise in the data. – closetnoc Sep 13 '19 at 16:55
  • In some cases, robots.txt may get pages de-indexed faster. However, if it doesn't work, you need to use noindex instead anyway. I appreciate what you're saying regarding Google showing uncrawled pages in search results, however please realize that this discussion is not about which pages Google should theoretically index, but what Google actually indexes in practice. As a final note, noindex and robots.txt are not the same mechanism internally, because if they were, then using robots.txt would guarantee that a page does not show up in SERPs, which we know is not the case. – Maximillian Laumeister Sep 13 '19 at 17:01
  • I'd phrase it as "robots.txt works to de-index sites most of the time." Google sometimes does index pages (especially home pages) blocked by robots.txt when there are enough external links pointing to them. See How to resolve Google “Indexed, though blocked by robots.txt” as an example. – Stephen Ostermiller Sep 13 '19 at 18:48
  • @MaximillianLaumeister You may be getting slightly confused conceptually. An indexed link showing up in the SERPs is not the same as a page showing up in the SERPs. This is one reason why I do not like them. BTW, I do know the process between noindex and robots.txt is the same. I am not guessing or reading other sites to know what is going on. I did, a few years ago, have a thorough inside view of Google's schema and business rules including algos. It has been a while and water does run under a bridge. Cheers mate!! – closetnoc Sep 13 '19 at 18:49
  • @StephenOstermiller Cheers!! You are pointing out that Google is showing the indexed link to the target page. It is not indexing the page itself. And that is precisely my point. It is a conceptual/semantics thing. The sites pages would be dropped from the index absolutely, however, the indexed links to the site will not and why I disagree with Googles policy to show indexed links in the SERPs. The confusion over what I am trying to describe is a case in point. The good news is that indexed links eventually do disappear from the SERPs due to falling performance metrics. – closetnoc Sep 13 '19 at 19:05
  • If you have 50 links pointing at your robots.txt disallowed page, Google indexed your page URL just one one time and uses a page title taken from one of the links. If Google "indexed links", then your page could appear in the index more than one time. It is still indexing pages, just not with their text content. And yes, they do often get de-indexed eventually, or commonly stay in the index but only show up for site: searches. – Stephen Ostermiller Sep 13 '19 at 19:12
  • @StephenOstermiller Yes. You are right in the sense that Google indexes the URL. Links are indexed, URLs are indexed, and page content are indexed. My explanation is a bit overly simplified. I am out of practice here afterall... URLs are always indexed and never removed. This is required for general management. However, a link from a source page to a target page where the target page has not been indexed, does not exist, or is disallowed is a dangling link. Google shows URLs in it SERPs that are dangling. My explanation is not clear enough clearly. I will go back and clarify later. Cheers!! – closetnoc Sep 13 '19 at 19:53
  • @StephenOstermiller Clearer than concrete now... may not be clearer than mud though. – closetnoc Sep 13 '19 at 23:23