5

I'm running a crawler on my website to test for broken links and such.

It starts by using a URL like www.domain.com

One curious thing is that it is showing directories with no internal links. For example, directory /example_dir/ is showing up in the crawl tree, but I can't find any internal link to that directory within the pages.

How could this be happening and is there a way to prevent it?

John Conde
  • 86,255
  • 27
  • 146
  • 241
edeneye
  • 171
  • 2

2 Answers2

3

What tool are you using to crawl your site?

Crawlers typically find new pages by following links so the odds are you have a link pointing to those directories. It may not be intentional, such as a a dynamic link that is pulling up bad data but not throwing out an error. If you aren't using Xenu's Link Sleuth I recommend using it as it will tell you what pages had links that lead it to crawl those directories.

John Conde
  • 86,255
  • 27
  • 146
  • 241
  • 1
    It was Netsparker which is a web vulnerability scanner. Doesn't do broken links like I thought.

    I grabbed Xenu and that works great, but didn't show the directory in question.

    My guess is Netsparker is picking it up from the robots.txt file and that's the difference here.

    – edeneye Nov 03 '11 at 19:03
2

My guess is that Jon is right, you must have a link somewhere. It might not show on the page but the spider is finding it.

Don't forget that code like this can happpen <a href="/my_dir/"></a>. Although it's blank to the user, it will be followed by the spider.

Stephen Ostermiller
  • 98,758
  • 18
  • 137
  • 361
TheAlbear
  • 322
  • 4
  • 10