Why is Google still not indexing my #! website?

Question

I have been working on a website which uses #! (2minutecv.com), but even after 6 weeks of the site up and running and conforming to the Google hash bang guidelines, you can still see that Google still hasn't indexed the site yet. For example if you use Google to search for 2MinuteCV.com benefits it does not find this page which is referenced from the homepage.

Can anyone tell me why Google isn't indexing this website?

Update:

Thanks for all the help with this answer. So just to make sure I understand what is wrong. According to the answers Google never actually indexes the pages after the Javascript has run. I need to create a "shadow site" which google indexes (which google calls HTNL snapshots). If I am right in thinking this then I can pick a winner for the bounty.

Update 2:

Since this was posted we have now switched to having static HTML files on our server. This is simple, fast, and gets indexed properly by Google.

On a side note you might want to look at the W3C validation errors - http://validator.w3.org/check?uri=http://www.2minutecv.com/#!/en_us/home&charset=%28detect%20automatically%29&doctype=Inline&group=0 — Vince P, Aug 01 '12 at 15:08
The question is – why aren’t you simply using a normal URI? It doesn’t seem to make sense here that you don’t. — Konrad Rudolph, Aug 07 '12 at 13:01
#! is not a normal URI or else Google wouldn't need a special indexer for it! Have you set up sitemap.xml correctly according to the #! documentation? Have you tried the Fetch as Google tool (in webmaster tools) to help debug the issue? — Olly Hodgson, Aug 08 '12 at 11:36
Also, off-topic for this question, but consider making use of the JS history API (where available). You've broken the back/forward buttons. https://developer.mozilla.org/en-US/docs/DOM/Manipulating_the_browser_history — Olly Hodgson, Aug 08 '12 at 11:41

score 10 · Accepted Answer · answered Aug 07 '12 at 18:09

The main reason for your pages not being indexed is because there are no html links. You're providing javascript links to the other pages and while the #! denotes that it should be a different page - you're not upholding your end of Google's javascript crawling agreement:

An agreement between crawler and server

In order to make your AJAX application crawlable, your site needs to abide by a new agreement. This agreement rests on the following:

The site adopts the AJAX crawling scheme. For each URL that has dymanically produced content, your server provides an HTML snapshot, which is the content a user (with a browser) sees. Often, such URLs will be AJAX URLs, that is, URLs containing a hash fragment, for example www.example.com/index.html#key=value, where #key=value is the hash fragment. An HTML snapshot is all the content that appears on the page after the JavaScript has been executed. The search engine indexes the HTML snapshot and serves your original AJAX URLs in search results.

(quote from developers.google.com on 17th febr 2012)

Since you do not provide a html fallback by which the crawler can determine what is static vs what is javascript it most likely will refuse to crawl its content.

Secondly, since the non #! urls all point to some 'youcaneat.at' page which bares no resembles, Google's bot is most likely to assume its a 'spam' attack, which will definitely not improve your chances of getting your javascript indexed.

Rule of thumb to keep in mind: stay with html when you can because Google promises you, that it might index javascript, at best.

Jochem.

+1 this answer. And just to be clear the OP understands that what jbokkers is saying is that you need to make the non shebang links, the regular html version, also work for example http://www.2minutecv.com/en_us/how_it_works needs to work and show content and not some completely different web site. Hope that clears it up. — Anthony Hatzopoulos, Aug 08 '12 at 00:08
We do not use # as you described in your answer, we use #! which is what google specifically uses for Ajax indexing — yazz.com, Aug 08 '12 at 06:58
Frankly it doesn't matter, your still not following Google's AJAX app guide. Check out step 2. Suppose you would like to get www.example.com/index.html#!key=value indexed. Your part of the agreement is to provide the crawler with an HTML snapshot of this URL — Anthony Hatzopoulos, Aug 08 '12 at 13:57

Dave · Answer 2 · 2012-08-02T10:40:59.250

5

Do you have any other sites pointing to it? Ironically the fact you've added a link to it from this site will ensure it does get indexed (not 100% but I would put money on)

Any way, it is indexed:

Google Link

Also, your code is poor... You have this code (as an example - this is copied from your site):

<img src="/images/arrow_to_login2.png" style="z-index: 3; top:292px; left: 315px; position:absolute;"></img>

There is no closing img tag, it is self closing... This is just one example, if your site is not coded well, then Google may struggle or fail, or index it only in part. I strongly suggest you put your website name into the W3C Markup Validation and correct it. This will help.

edited Aug 02 '12 at 10:40

answered Aug 02 '12 at 08:07

Dave

1,142
6
13

Thanks. I am trying to understand the answer. I clicked on the google link you provided. Where can I find the link to the benefits page http://www.2minutecv.com/#!/en_us/benefits page there, or am I looking in the wrong place? – yazz.com Aug 02 '12 at 11:43
1

No, my point is one of your pages IS indexed. This suggests Google knows you exist. The reason why no other pages are not indexed could be varied - however, starting by correcting the code is a good place to start. – Dave Aug 03 '12 at 12:13
That is incorrect, as google knows that the HTML version of my home page is there, but they do NOT index the Ajax version of it. Google recently came out with Ajax indexing where they wait for the Ajax of the page to load "before" they index it. – yazz.com Aug 03 '12 at 12:25

score 5 · Answer 3 · answered Aug 07 '12 at 23:59

The reason google is not following your Shebang (#!) links is because when the page loads initially they do not exist and they are no where to be found in the source code. In other words with javascript disabled you do not have a single <a> anchor tag in your html source of your page. The only thing that will be indexed is a blank page with copyright. Home, Benefits, How it works, and FAQ links get loaded via javascript. Disable javascript and you get this (which is what google gets):

nothing but a copyright and no links

Google will not index what it cannot crawl. Neither will other search engines. Google can run javascript but don't bank on it being used for crawling content (yet). It will parse some javascript and ajax links. In your case your page source has none.

So you need to add static tags on your page linking to these #! pages AND it wouldn't hurt if you added a sitemap.xml. Which by the way is strangely pointing to another 'youcaneat.at' website all together. http://www.2minutecv.com/sitemap.xml

And if you can avoid it, stop asynchronously loading everything after page load. Your site is not fancy enough to need ajax and there is no real benefit in your case to employ the tactic.

Ok, so are you saying that the google AJAX crawling scheme we are using which is documentated is not implemented yet? https://developers.google.com/webmasters/ajax-crawling/docs/getting-started?hl=nl — yazz.com, Aug 08 '12 at 06:55

score 3 · Answer 4 · edited Apr 13 '17 at 12:33

3

The reason is because you're using the hash symbol. This indicates that the link is on the same page to the bot.

http://www.2minutecv.com/#!/en_us/benefits

An example of this is the stackexchange website. Go to the FAQ section, the links on the right use local html links which reads the name prefixed by the hash symbol.

https://webmasters.stackexchange.com/faq#reputation

Google will not index #reputation because it's just part of the page and https://webmasters.stackexchange.com/faq has already been indexed.

I would assume, due to your link URL, Google thinks http://www.2minutecv.com/#!/en_us/benefits is the same page as http://www.2minutecv.com/ and regardless of whether it can crawl it or not is not indexing it.

edited Apr 13 '17 at 12:33

Community

1

answered Aug 07 '12 at 11:49

Dave

1,142
6
13

1

No – the whole point of using #! instead of plain # is to make Google understand that the URI refers to other content, and should be indexed. – Konrad Rudolph Aug 07 '12 at 13:04
1

I'd advise against using them. – Quentin Aug 07 '12 at 19:09
Hi Dave, Konrad is right, the whole point of #! is that google waits for the Ajax to load the page and then indexes it whent he content is loaded – yazz.com Aug 08 '12 at 06:57
@Zubair - Your site isn't currently indexed - would you be willing to try something - remove the # completely. At the moment, we have Google saying it should be able (IMO, should is not good enough) and either way, we have a fault which is complex and eliminating errors one by way I feel is the next step. Add a new page (if you can) with a link on your home page to a URL which doesn't include a # at all. Go to webmaster tools, ask to re-crawl etc and see if this has any effect over the next week? – Dave Aug 08 '12 at 08:56
Thanks, this is the first reasonable suggestion someone has given. Ok, I will see what I can do. We have a redesigned home page coming soon, so the first thing I will do is make sure that uses valid HTML and the # – yazz.com Aug 10 '12 at 10:23
@Konrad - I don't think it works though - Yes, your answer is correct, BUT, I see very little to no evidence that it works effectively. – Dave Aug 14 '12 at 09:46
@Zubair - you can tell me as often as you like that Google states it works etc, but your actions speak louder than words (text) - IMO, get rid of this approach as it's not doing what you desire. Who knows, in the future the #! may improve but for now, I would bite the bullet and re-review. – Dave Aug 14 '12 at 09:47
@DaveRook No, it does work – see Anthony’s answer for the real reason why it doesn’t work in the specific case. The URI in question is simply an empty page (content gets loaded asynchronously). Of course Google doesn’t find anything. – Konrad Rudolph Aug 14 '12 at 09:56
Yeah, I agree (and I wrote answer similar after discovering the same) - but is this a 'trait' of using #! or is the site "faulty" (I use faulty loosely as a term, please do not be offended anyone). – Dave Aug 14 '12 at 10:00

score 2 · Answer 5 · edited Jun 16 '20 at 10:32

2

From Google FAQ:

Q: My site isn't indexed yet!

A: Crawling and indexing are processes which can take some time and which rely on many factors. In general, we cannot make predictions or guarantees about when or if your URLs will be crawled or indexed. When looking at a site's indexing in Webmaster Tools, make sure that you have both the "www" and the "non-www" versions (like "www.example.com" and "example.com") verified and have a set a preferred domain. Keep in mind that while a Sitemap file can help us learn about your site, it does not guarantee indexing or increase your site's ranking.

edited Jun 16 '20 at 10:32

Community

1

answered Aug 01 '12 at 14:48

Vince P

640
4
15

Interesting, could you expand on this answer? Which one of these factors you describe is taking place in this case? – yazz.com Aug 03 '12 at 12:29

Dave · Answer 6 · 2012-08-08T13:57:53.037

EDIT : This answer is very similar to Anthony Hatzopoulos

Are you aware that there is NO content on your rendered pages (sorry for yet another answer, but they are different suggestions)? This has nothing to do with load time either.

Google can read the rendered HTML of a web page - however, if you go to http://www.2minutecv.com/#!/en_us/benefits and view the source (right click the page View Source), you will see that all you have is empty DIV tags, a few javascript commands, and a "all rights reserved" footer - despite the fact I can see your page has content and sections (such as Stay focused on the content), a search in the View Source does not display the information or the text. You actually have a rendered HTML page with 0 content according to the source file!

I think Google is seeing your page but (I assume) because it can't see any content in the rendered output (which is the only thing Google can read) it doesn't index it.

This is pretty much what I also tried to point out in my answer. — Anthony Hatzopoulos, Aug 08 '12 at 13:54

Why is Google still not indexing my #! website?

Update:

Update 2:

6 Answers6