2

My sitemap contains 50K URLs/7,8 MB and this following URL syntax:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc> https://www.ninjogos.com.br/resultados?pesquisa=vestido, maquiagem, </loc> <lastmod> 2019-10-03T17:12:01-03:00 </lastmod>
<priority> 1.00 </priority>
</url>
</urlset>

The problems are:

• Search Console says "Sitemap could not be read";

• The Sitemap takes 1 hour to load and Chrome stops working;

enter image description here

• In firefox the Sitemap has downloaded in 1483ms and fully loaded after 5 mins);

Things I've done without sucess:

• Disable GZip compression;

• Delete my .htaccess file;

• Create a test Sitemap with 1K URLs and the same syntax and sent it to Search Console and it's worked but the 50K URLs Sitemap still shows ""unable to fetch Sitemap";

enter image description here

• Tried to inspect the url directly but it gave error and asks to try again later while the 1K urls worked;

• Tried to validate the Sitemap in five different sites (YANDEX, ETC) and all worked without no error/warning

Any light?

Stephen Ostermiller
  • 98,758
  • 18
  • 137
  • 361

1 Answers1

5

You should test your sitemap with a downloading program such as curl or wget instead of using a browser like Chrome or Firefox. You should be able to download the file within 3 minutes with a download program. If the file takes longer to download for you, then Googlebot will probably also have problems with it. You can:

  • Upgrade your hosting so your entire site is faster
  • Pre-compress your sitemap with gzip so that the URL is sitemap.xml.gz. That way it will be much smaller and you won't need to disable gzip on your server.
  • Remove lastmod and priority from your sitemap since Google doesn't use them anyway.
  • Break up your sitemap into smaller pieces and use a sitemap index file
  • Remove white space from your sitemap. It looks like all your fields are surrounded by spaces. That isn't correct. It is adding to the size and possibly confusing search engines. <priority> 1.00 </priority> should be <priority>1.00</priority>. Same for <loc> and <lastmod>.
  • Remove unnecessary URLs from your site map.

On the last point, your example is problematic. I believe "resultados pesquisa" translates to "search results". Google doesn't want to have your search results pages indexed. You should be blocking Googlebot from crawling them and you should remove them from your sitemap. See Search results in search results. Having your site search results indexed is bad user experience for users from Google and it can cause Google to penalize your entire site.

You tagged your question but your XML sitemap probably won't help your SEO at all. Google doesn't rank pages better because they are in a sitemap, nor will Google usually choose to index a page just because it is in the sitemap. See The Sitemap Paradox. The benefits from having a sitemap are mostly in getting better stats out of Google Search Console. You could also use it as one method of telling Google about your canonical URLs (but they are better ways such as canonical tags.) Because they aren't much use, if your sitemaps are giving you headaches, you can just delete them and not worry about having them. It won't hurt your site or its SEO.

Stephen Ostermiller
  • 98,758
  • 18
  • 137
  • 361
  • My site is about games if a user searches for "horse" it will return "best horse games" with PHP GET, having this indexed is not good for SEO? Given that I don't have horse games category and other millions of possibilities taken from my games API – Eder Leandro Oct 05 '19 at 00:15
  • I Mean.. these pages ranked in these thousands of low competition keywords will eventually make my site rank better... especially if the conversion generates any backlinks... or at least I think ... Is it right to think this way? – Eder Leandro Oct 05 '19 at 00:21
  • I will compress my sitemap in gz... the Sitemap has not been validated in Search Console so far ... anything i'll update here – Eder Leandro Oct 05 '19 at 00:25
  • My apashe server already is configured for compress xml... the compression you say is not the gzip server compression? I'ts a manual gzip compression? If I do this is need disable gzip? because it will do a twice compress – Eder Leandro Oct 05 '19 at 00:30
  • Your server should be configured only to compress certain file types. If you make it .gz then it shouldn't recompress if your server is properly configured. – Stephen Ostermiller Oct 05 '19 at 02:53
  • 1
    If your pages look like search results Google doesn't want them indexed. – Stephen Ostermiller Oct 05 '19 at 02:54
  • I'ts ok to use this PHP script to compress? – Eder Leandro Oct 05 '19 at 12:16
  • I have send to GSC a compressed gz sitemap and the same in normal xml, deactivated gzip for gz files and removed the tags, reduced the size from 7.8 mb to 3mb and the compressed went to 250kb but still "Can't fetch error" in both – Eder Leandro Oct 05 '19 at 12:25
  • Did you remove the extra spaces? The other thing I'm noticing now it's that many of your URLs have spaces. Inside a URL the spaces should be encoded with %20 – Stephen Ostermiller Oct 06 '19 at 01:39
  • Replaced every space with %20 with Notepad++ and instantly validated.., Thx for the help, God bless you – Eder Leandro Oct 06 '19 at 20:49