1

I'm troubleshooting a website that uses an email verification process that has had a number of its tokens indexed by BingBot.

The process involves sending the user a confirmation link which contains a guid associated with an email.

The link directs to an account verification action similar to:

~/Account/Verify?email=user@provider.com&token=0000-1111-2222-9999

While trawling server logs to investigate an unrelated issue I saw BingBot was crawling these URLs.

The site had no robots.txt to prevent indexing of the URLs for verification and resending confirmations. I've also noticed the site passes quite a few parameters in query strings.

Even so, I can't understand how these links have ended up getting indexed.

I spoke to a colleague who suggested the links may have been security scanned and indexed as they have passed through Microsoft mail hubs.

I'm unsure about this, and it doesn't explain the emails that come from non Microsoft domains.

The only other thought I had was the page gets indexed as the user hits it at confirmation, but colleagues have suggested this is unlikely - there must be a link to these verification URLs on the web somewhere for them to get crawled?

Some things to note:

Once email addresses have been confirmed - there's no real harm in these URLs getting hit again, the tokens are dead nothing gets reset or updated.

I am able to put some sensible changes in place to mitigate against this, but I'm just trying to understand how these have come into Bing's domain.

Stephen Ostermiller
  • 98,758
  • 18
  • 137
  • 361
Dave
  • 111
  • 2
  • Are you sure you mean "indexed"? Are the URLs showing up in search results? What content is on these URLs? Or do you just mean that they are getting crawled (hit by bingbot)? – Stephen Ostermiller Sep 22 '22 at 10:21
  • Yes without giving too much away if I search bing for mysitebaseurl/Account/Verify it returns a page which contains a users email and token in the query string. It's only one - but in iis logs theres about 30 of these links being hit by bingbot every day – Dave Sep 22 '22 at 10:26
  • How do you know if a domain is a "non-Microsoft domain" anyway? Microsoft is one of the biggest email system providers. Their Office 365 service is used by the domains of thousands of companies. Are you seeing these links crawled for domains that are known not to use Microsoft, for example gmail users? – Stephen Ostermiller Sep 22 '22 at 10:29
  • 1
  • @StephenOstermiller our initial thought was O365 was scanning the links for security and that's how they were getting crawled by bingbot - but we are also seeing logs that corresponded with gmail and yahoo accounts so struggling to understand how this might be happening. Thanks for that link though very helpful – Dave Sep 22 '22 at 10:35
  • You can access Yahoo and Gmail accounts through Outlook desktop app, which would then use the O365 security features on any URLs on top of any provided by Yahoo or Gmail. – GeoffAtkins Sep 28 '22 at 10:50

0 Answers0