2

I used the google.com/ping service to make it download 500 sitemaps for a customer of mine, that was about 30 hours ago.

Googlebot reacted immediately, downloaded all 500 sitemaps.

Now I have more than 30,000 downloads from Googlebot in less than two days.
Always the same unchanged sitemaps are downloaded, again and again and again. They are valid, clean and Bing downloads them ONE time and is fine with it.

I am quite worried about that amount of downloads, that will cause significant traffic over time and it's useless.
Those sitemaps are not going to change soon.

The webserver is a normal Apache.

The download count increased to 92000 downloads of unchanged sitemaps, I blocked google from access to sitemaps for half a day. This helped reduce the rate but it is slowly ramping up again.
Though since I blocked the GoogleBot the rates are a lot better than before.

So at this point it seems under control but the incident is really strange.

John
  • 121
  • 3
  • Can't address your main question, but you might want to read regarding whether sitemaps are necessary. https://webmasters.stackexchange.com/questions/4803/the-sitemap-paradox. – Trebor Jan 03 '21 at 22:28
  • 2
    The website has millions and millions of pages, sitemaps should help enormously to cover them quick and without missing out. random crawls would need a long time – John Jan 03 '21 at 22:34
  • In my experience, Google's crawling isn't totally random, but crawls links to determine content before crawling sitemaps. That's still a lot of pages though. – Trebor Jan 03 '21 at 22:41
  • @Trebor Google crawled about 4000 pages and 31000 times the same sitemaps Something is bugging on their end, I just don't understand what is triggering it – John Jan 03 '21 at 23:19
  • 1
    What if you removed your sitemaps, and added them back in a few at a time? I'm not sure how many a few would be, but what if you added 50 every day or some other increment? I've had similar problems with other apps when I overloaded the number of submissions and they got stuck in a loop trying to process my submissions. You'll probably have to wait for Google to exit out of the current loop before resubmitting. – Trebor Jan 03 '21 at 23:59
  • Googlebot typically downloads pages and sitemaps many times. It depends on how much Pagerank the site has how often it re-downloads. I'm not sure there is anything wrong here. If you have so many pages that it requires 500 sitemaps, you are going to need to support a ton of Googlebot crawling (and hopefully a ton of human visitors too.) – Stephen Ostermiller Jan 04 '21 at 11:39
  • @StephenOstermiller it just makes no sense ? Google indexed only a few thousand pages from the first sitemap. It downloaded the other 499 sitemaps 50,000 times without indexing a single page that is contained in them. – John Jan 04 '21 at 14:45
  • I wouldn't expect Google to index very many pages from a huge site unless that site is well established. I would expect Google to crawl pages from all the site maps it's downloading. Not sure why it would be downloading just the site map many times but not any of the pages within it. – Stephen Ostermiller Jan 04 '21 at 18:27
  • How old is the site? Is it well established? Does it have a lot of inbound links? How many pages does Google Search Console report were indexed before submitting the sitemaps? – Stephen Ostermiller Jan 04 '21 at 23:14
  • @StephenOstermiller would that make a difference ? In 3-4 days Google has downloaded my sitemaps 92200 times, generating traffic of 22 gigabytes. The sitemaps were identical with each download. – John Jan 06 '21 at 17:59
  • If the site is fairly new, you shouldn't be submitting more than 10,000 of the best pages to get indexed. Having way more pages than Google is willing to crawl or index isn't good for SEO. Better to focus on what is possible. – Stephen Ostermiller Jan 06 '21 at 19:42

0 Answers0