What happens when you submit a sitemap to google for a website that is not up yet?

Question

I'm building an angular web application that is able to periodically autogenerate sitemaps. Once it is done autogenerating all the sitemaps it automatically informs google of the location of the sitemaps using the ping get request.

Today i had first tested this functionality and it all worked correctly. The sitemaps were created from a set of test urls and were then uploaded to a bucket i made in amazon s3. After that the application successfully pinged google the url of the sitemap index file.

There is only one problem. The website itself is not yet live. The only thing that is up right now is the registered website domain. All the test urls inside the sitemaps have this domain as their base url.

So when the google spider downloads the sitemaps and finds out that none of the website urls are valid, will this affect the ranking of my future website? Or will google just ignore the sitemaps when it finds out there is no website active on the domain?

Thank you

PS: I've removed all of the test sitemaps from the bucket after writing this post. Maybe i'm lucky and google hasn't downloaded the sitemaps yet.

score 3 · Accepted Answer · answered Jan 04 '22 at 09:43

3

It won't hurt the rankings of your future website because XML sitemaps have no effect on rankings. Sitemaps have very little effect on SEO at all. At best they can get Googlebot to come crawl URLs quickly, give you extra stats in Google Search Console, and tell Google about your canonical URLs. However, Google usually doesn't choose to index URLs that it can only find through a sitemap and sitemaps don't help rankings. See The Sitemap Paradox.

In any case, it isn't valid to upload a sitemap to an S3 bucket. When your sitemap URL is something like https://example.s3.region.amazonaws.com/sitemap.xml and your site is https://example.com, it would be a cross-domain sitemap. It is not valid to submit a sitemap for URLs on a different domain without first verifying that you own both domains. See https://sitemaps.org/protocol.html#sitemaps_cross_submits

When your sitemap is on an S3 bucket, Googlebot will ignore all the URLs in the sitemap until you add both the s3 bucket and the main site as separate properties in the same Google Search Console account. Right now, Google is going to find your sitemap to be invalid.

When you launch your site live, you should serve your sitemap from a URL on your own domain name, not from an S3 bucket. Leaving it on S3 will mean that search engines can't use it until you go through verification steps with each search engine.

answered Jan 04 '22 at 09:43

Stephen Ostermiller

98,758
18
137
361

When your sitemap is on an S3 bucket,Googlebot will ignore all the URLs in the sitemap until you add both the s3 bucket and the main site as separate properties in the same Google Search Console account.. According to the info in the sitemaps.org link you provided all i have to do is point to the aws s3 sitemaps url in robots.txt. It says You can do this by modifying the robots.txt file on www.host1.com to point to the Sitemap on www.sitemaphost.com..The page makes no mention of also having to add it to the Google Console account. So won't google find the sitemaps through robots.txt? – Maurice Jan 04 '22 at 18:58
I'm not sure that Google supports that. Here is their doc: https://developers.google.com/search/blog/2007/10/dealing-with-sitemap-cross-submissions – Stephen Ostermiller Jan 04 '22 at 19:00
thank you for this link @Stephen Ostermiller, very informative. I might be able to create a host subdomain, for instance sitemaps.example.com and have crawlers that visit that domain get redirected to the sitemap file i've stored in the s3 bucket. Would something like that be valid? – Maurice Jan 04 '22 at 19:12
i've also made a separate question for this here. – Maurice Jan 04 '22 at 19:26
https://developers.google.com/search/docs/advanced/sitemaps/large-sitemaps says that the robots.txt sitemap directive is sufficient. You may be good to go with that. – Stephen Ostermiller Jan 04 '22 at 19:55

score 2 · Answer 2 · answered Jan 04 '22 at 04:13

It might take longer for pages to get indexed when you submit them for real, since nobody was home last time they checked.

I wouldn't worry about it much...but if you want to:

Check your logs to ensure that Googlebot did fetch those urls
If they did, stand up a quick Coming Soon type landing page a bit before you're about to launch and 307 all of the URLs in your sitemap to it
Then ring their doorbell again when you launch making sure everything comes back 200

In the future, I'd recommend testing your application's ability to successfully ping your sitemap without actually pinging Google.

I'll bet they don't have server logs for a site that isn't up yet. DNS may not resolve or it may be parked with the domain registrar. — Stephen Ostermiller, Jan 04 '22 at 09:44

What happens when you submit a sitemap to google for a website that is not up yet?

2 Answers2

Linked