3

If you site:news.cnblogs.com in google's search box, you will see a lot of web pages are indexed by google. But if you click on any of the search results, you will be redirected to account.cnblogs.com which asks you to login.

I wonder why google indexes the webpages belong to news.cnblogs.com? I tried to switch my browser's user agent to googlebot but still was redirected to the login page. How can google see the real content? Is it possible for normal visitors to see the blocked content without login as google does?

William
  • 173
  • 5
  • Is cnblogs your website? – Stephen Ostermiller Aug 25 '21 at 16:52
  • 3
    There is a similar question here about how Google can access content behind a login. The accepted answer recaps how to do this - https://stackoverflow.com/questions/1382247/how-do-i-allow-google-to-index-login-required-parts-of-my-site Specifically, you aren't able to see login content when you switch to googlebot because the site is likely doing an IP lookup to verify it is really Googlebot (see https://developers.google.com/search/docs/advanced/crawling/verifying-googlebot) – Matthew Edgar Aug 25 '21 at 17:07
  • @MatthewEdgar, thank you for your information. I think google's policies contradict each other. It does not allow cloaking but if you add the paywall structured data, it says ok. From user's perspective, they are the same thing. – William Aug 26 '21 at 03:23
  • 1
    @William You are right that paywall and cloaking are very similar in how they represent. How I see the difference is a paywall is a legitimate way to configure a website where cloaking is a tactic you use to manipulate Google. If you are running a site with a paywall, then, it is imperative you explain that correctly so that Google doesn't think you are cloaking instead. – Matthew Edgar Aug 26 '21 at 19:03
  • I voted to close this question because it is not about your own website. – Stephen Ostermiller Oct 01 '21 at 06:29

0 Answers0