Questions tagged [googlebot]

Googlebot is the bot software that Google uses to crawl over 20 billion pages each day, the data obtained during a crawl is then analyzed and ranked on Google Search.

Googlebot is the search bot software used by Google, which collects documents from the web to build a searchable index for the Google Search engine. Googlebot is also know as a robot and more suitable questions and answers may be found under

933 questions
6
votes
2 answers

GoogleBot doing POST requests

Why has GoogleBot started last Friday to do POST-requests on a page. I can see in the logfile (just an example, had about 10.000 entries over the weekend - the url in the log is changed): 66.249.79.55 - - [15/Jul/2019:08:34:53 +0000] "POST /Contact/…
Seb
  • 165
  • 4
3
votes
2 answers

Googlebot is still attempting to crawl old content

I have content that was deleted several years ago and from time to time Googlebot still attempts to access those pages, filling up my logs with lots of 404, making the 'real' problems harder to find and to read. I have found Google is still crawling…
Neograph734
  • 419
  • 1
  • 3
  • 10
3
votes
0 answers

GoogleBot constantly crawls non-existant PDFs with 'viagra' or 'cialis' in the name?

My access log is full of requests for non-existent pdfs relating to 'viagra' and 'cialis' or other similar drugs from GoogleBot (user-agent is: Googlebot/2.1 (+http://www.google.com/bot.html) ip range is…
Omn
  • 131
  • 1
3
votes
2 answers

Googlebot keeps on crawling stale/nonexistent(410) resources and then shows a crawl anomaly followed by de-indexing pages

We have been experiencing constant crawl anomalies caused by googlebot continued crawling of 'rnd.js?asdfasdfasfs3423' (hash is random on each pageload). It has been 3 months since we removed rnd.js from all of our pages, yet googleboot insists on…
3
votes
1 answer

GoogleBot crawls thousands of urls like 2487763877595434670.htm that doesn´t exist

in our serverlogs there are thousands of requests (50-100k) from googlebot every day for urls like /2487763877595434670.htm (as I can see always 19 random digits with .htm at the end) First the bot requests http:// url that is redirected to https://…
wsm
  • 31
  • 1
2
votes
1 answer

How to simulate googlebot to see which links in a React app would be indexed?

I am developing a React app. I’ve had poor indexing coverage until now (only the home page was indexed). I recently implemented server side rendering (SSR) and indexing coverage appears to be significantly better. That being said, feels like I am…
sunknudsen
  • 121
  • 1
2
votes
1 answer

Why does Googlebot attempt to crawl /admin/install.php?

On one site I own, I recently started seeing Googlebot checking for non-existing URIs: 66.249.76.89 - - [23/Feb/2020:10:18:48 +0100] "GET /robots.txt HTTP/1.1" 404 118 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"…
herrbischoff
  • 175
  • 7
1
vote
2 answers

Why can't Google Bot access my site?

I repeatedly get an warning/error message from Google as follows: Googlebot can't access your site I have checked, edited, removed, replaced the robots.txt file on the site all to no avail. Here is the content of the file: User-agent: * Allow:…
forrest
  • 225
  • 3
  • 10
1
vote
1 answer

Skip campaigns from indexing

I have a webpage with campaigns that are only valid for a specific period of time. My problem is that when a campaign is active, the homepage redirects to the campaign page and therefore the Google search result for the website shows the information…
o15a3d4l11s2
  • 111
  • 3
1
vote
1 answer

Googlebot is scanning random link on my website

Just take a look at this: example.com:80 66.249.79.18 - - [13/Jun/2017:20:09:26 -0700] "GET /d733421/example.com HTTP/1.1" 404 16729 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" example.com:80 66.249.79.20 - -…
user78997
0
votes
0 answers

Google Crawler constantly reports ~2-3% errors

For the last year and a half, I have a constant 2-3% error rate while crawling my web site. Types of errors are "Server error: No response". I've switched ISP, but errors persist. Debugging of application stack didn't give me any hint…
Jakov Sosic
  • 101
  • 1
0
votes
1 answer

Why does my htm pages have a number attached at the end of the link?

When I wanted to fetch a certain page with htm extension I noticed an attachment after the htm extenstion http://www.example.com/IP_814.htm#.Uk1_MtKsh8E Google could not fetch this page it gave the 502 error code. When I fetched the same page as…
Lexi
  • 1
  • 1
0
votes
1 answer

What time does Googlebot crawl sites

Daily evening I sync the localhost copy of website to the production one. Yesterday I could not do it and I did so today at 10:00 am. Common sense says that Googlebot should be crawling the site when it is most idle which is during night in that…
AgA
  • 1,438
  • 3
  • 13
  • 29
0
votes
1 answer

Googlebot following invalid URLs. How do I troubleshoot the problem?

I'm getting errors generated by Googlebot while following invalid URLs. Checking with Google Webmaster tools confirmed that the problem was the bot: I've checked the source code and the HTML generated, but couldn't find out why Googlebot is finding…
Eduardo Molteni
  • 207
  • 1
  • 7
0
votes
0 answers

Google Search Console says it crawled the website but the access logs show no google Bots access at all

Google Search Console tells me that it crawled the website last today (I requested a fresh crawling, as the last was some months ago.) The crawling report tells me now it crawled it, last, today. However, the server access logs show nothing related…
user118386