On the Shepherd Interactive site, we have a dynamic navigation menu in Flash. In order to prevent it from having to reload each time a page is changed, I implemented Ajax loading so that the SWF only has to load once. This is similar to what Lala and Facebook do. So if your browser is Ajax-enabled, upon visiting:
http://shepherdinteractive.com/portfolio/interactive/
You will get redirected to site root /
with the old path supplied as the URL hash fragment which is then loaded in via JavaScript as the page content:
http://shepherdinteractive.com/#portfolio/interactive/
However, according to Google’s Making AJAX Applications Crawlable specification, Ajax pretty URLs are any containing a hash fragment beginning with
, for example:!
http://shepherdinteractive.com/#!portfolio/interactive/
The purpose of the !
is merely to inform Googlebot that such a URL is for an Ajax page whose content can be fetched via:
http://shepherdinteractive.com/?_escaped_fragment_=portfolio/interactive/
The problem I have with Google’s specification is that the pretty URL Ajax fragment prefix (!
) is mandated; it is not customizable. I should be able to tell Googlebot which fragment identifiers are for Ajax content and which are not. Therefore, instead of requiring authors to to conform to Google’s Ajax specification, I propose that Google adopt an extension to robots.txt which allows site owners to let Googlebot know what to look for. The current specification’s mandate for !
could be indicated via:
Ajax-Fragment-Prefix: !
Or it could be changed to anything else, such as “ajax:
“. If the Ajax fragment doesn’t have a prefix at all (as in the case of Shepherd Interactive’s website above), a regular expression pattern match could be specified in robots.txt, for example:
Ajax-Fragment-Pattern: .+/
This would tell Googlebot that a URL with a fragment containing a slash should be fetched via the _escaped_fragment_
query parameter, and that the Ajax URL itself (including the fragment identifier) should be indexed and returned verbatim in the search results.
It’s true that the Shepherd Interactive site implements Hijax (progressive enhancement with Ajax) techniques, so every Ajax URL has a corresponding non-Ajax URL; so in this sense, Google can still access all of the content. The problem, however, is with links to Ajax URLs from around the Web. I assume for Googlebot, every link to an Ajax URL without the obligatory prefix !
is interpreted as referring to the home page (site root /
):
http://shepherdinteractive.com/#portfolio/interactive/harmer-steel/
http://shepherdinteractive.com/#about-us/our-company/
http://shepherdinteractive.com/#services/web-development/
So Google then assigns no additional PageRank to our Ajax URLs since it gets assigned to the home page. If, however, Googlebot could be told that those are actually Ajax URLs, then the PageRank could be properly assigned. (My assumptions here could be incorrect.)
Thoughts?
Update 2010-04-22: “Cowboy” Ben Alman brought up the excellent point that customizable Ajax-crawlable URLs need to work on a per-path basis, as different single page web apps on the same server (or even in the same folder) might have different URI requirements.
I wonder if in addition to (or instead of) using robots.txt to tell Googlebot the format of the Ajax-crawlable URLs, why not allow that information to be placed in meta
tags? Their specification already includes:
<meta name="fragment" content="!">
Why not make Googlebot support additional meta
tags for recognizing the prefix (if any) or pattern that characterizes an Ajax-crawlable URL on a given page? For example:
- Current default behavior:
<meta name="crawlable-fragment-prefix" content="!">
- Starting with “/” like
…about/#/us/
: <meta name="crawlable-fragment-prefix" content="/">
- All fragments are crawlable:
<meta name="crawlable-fragment-prefix" content="">
- Any fragment ending with a slash:
<meta name="crawlable-fragment-pattern" content=".+/">
With these meta
tags, authors would be able to have complete page-level control over the structure of Ajax-crawlable URLs (hash fragments).
Leave a Reply