Canonicalization sounds like a process for recognizing
sainthood, or maybe a training course in aiming large projectile
weapons. But it’s actually one of the most important aspects of organic
SEO. Good canonicalization means search
engines crawl more pages of your site; it means that link authority and
PageRank get consolidated, so you have a stronger link profile; and it
means fewer broken links from other sites. Bad canonicalization gets you
all that stuff, but with the opposite effect.
Canonicalization defined
The Ian-Lurie-mangles-the-meaning-so-computer-geeks-cringe-definition
of canonicalization is: “every resource on your web site has a single
web address.”
Every resource means every page, every image, every video, etc..
Single web address means there’s only one Uniform Resource Locator (URL) for each page of content, image, video, etc..
A URL looks like this:
http://www.mysite.com/
Or, it could be: http://www.mysite.com/blah/foo.html.
Or, it could be: http://www.mysite.com/blah/foo.php?meh=123.
Or… Oh, you get the idea.
Note that I said ‘page of content’. That means that a single article,
product description or list of articles should appear at a single URL.
You should never have multiple URLs for, say, one product description,
or one article.
Some of the absurdly bloated content management systems and
e-commerce suites out there make canonicalization a challenge. But it’s
worth it.
Consequences of bad canonicalization
Here’s an example of ‘bad’ canonicalization: Let’s say I’ve opened a
games store: Ian’s Nerdvana (I owe Dave Barry for the term ‘nerdvana’).
My store’s home page lives at:
http://www.iansnerdvana.com/
But it also lives at
http://iansnerdvana.com/
and
http://www.iansnerdvana.com/index.html
So what? People will find the home page at all three versions. They
won’t know the difference, right? Well, yeah. But search engines will.
Googlebot sees the three above URLs as three different pages on the web.
That has two consequences that hurt SEO.
First, you lose link authority. If blogger 1 comes to
‘www.iansnerdvana.com’ and links to that page, blogger 2 lands on
‘iansnerdvana.com’ and links to that URL, and blogger 3 lands on
‘www.iansnerdvana.com/index.html’ and links to that page,
Googlebot sees three links to three different pages, and applies 1
‘vote’ to each one. These three links could have sent three
authoritative signals to Googlebot for my site’s home page. Instead,
they’re split into three weaker individual votes for three different
pages. It’s as if Ross Perot or Ralph Nader were sitting in front of my
site, siphoning off votes. It’s link love mayhem.
If I weren’t such a loser, I would’ve set up my site so that my home
page ‘lived’ at one unique URL – ‘www.iansnerdvana.com’. Then all 3
bloggers would have linked to that page, and Googlebot would instead
apply all three votes to a single page. If I care about link authority –
and who doesn’t, I ask you? – then that’s a far better outcome.
Secondly, search engines won’t crawl your site as deeply as they
might. Search engines allocate resources for each crawl. No one knows
exactly how, but it’s safe to say Googlebot won’t just wander around
your site until its found every page. At some point, it gives up and
leaves. If multiple pages on my site have multiple URLs, then visiting
search bots waste time tracking down all of those different versions.
That’s time they could spend crawling other unique pages, instead. So
fewer unique pages of my site end up in the search index, and I have
fewer chances to rank.
Don’t feel bad, though. Even SEO agencies screw it up. Here’s one
with their home page at both ‘www.site.com/’ and
‘www.site.com/index.php’. Oops:
Best practices
You can avoid the heartbreak of bad canonicalization, or at least minimize it, by doing a few simple things:
Use 301 redirection to ensure that your home page is only found at one URL. If you don’t know how, read Stephan Spencer’s column about rewrites and redirects.
Link consistently to your home page from within your own site. Use a
single URL for your home page. Don’t mix in instances of
‘www.iansnerdvana.com/index.html’ with ‘www.iansnerdvana.com’. If you
aren’t doing this properly right now, a quick change may have a big
impact on SEO.
Don’t use tracking IDs in internal site navigation. A lot of sites
add stuff like ‘?source=blog’ in their navigation. That lets them use
their analytics reports to track user movement within, to and from their
site. Instead, learn to use your web analytics referrer and navigation
path reports. If you must use tracking IDs, change your software to use a
hash mark (a ‘#’ sign) instead of a question mark. Search engines
ignore everything after the hash, so you’ll avoid confusion.
Don’t use tracking IDs in organic links from other sites. If you get a
link on another site, and want it to help with your SEO, don’t put a
tracking ID in that, either.
Be careful with pagination. Many sites have pagination, where
visitors can click a 1, 2, 3 etc. to jump to later pages in search
results, product lists or articles. That’s fine, but make sure that the
each page has a single URL. For example, if page 1 of the article is
‘www.iansnerdvana.com/article.html’ when I click the article link from
the home page, make sure that the number ‘1′ in the pagination takes me
there, too, instead of to ‘www.iansnerdvana.com/article.html?page=1′.
Set up preventative redirects. Make sure that ‘iansnerdvana.com’ 301 redirects to ‘www.iansnerdvana.com’.
Exclude ‘e-mail a friend’ pages. Most content management systems that
have ‘e-mail a friend’ options direct the user to a unique page that
has the same form and content. But every instance of that page has a
unique URL like ‘ID=123′, to tell the server which product or article to
forward. It’s canonical higgeldy-piggeldy. Use robots.txt and the meta robots tag to exclude these from search engine crawls.
Use common sense when building your site. Think, man/woman! If you
need to change the header, footer or other page element based on where
on your site the visitor came from, do it with cookies, or by sniffing
out the referring URL. Design to do this ahead of time.
What about rel=canonical?
The canonical tag is a neat little gadget that’s supposed to let you
tell search engines the correct URL for any page. So, by adding <link rel=”canonical” href=”http://www.iansnerdvana.com/”>
to any page, I could tell visiting search bots to index just that
version, and to direct all link authority to that one URL. It sounds
ideal.
It’s not. First, Yahoo! and Bing don’t yet have confirmed support for
it. Second, you can’t rely on tags of this nature, as search engines
may change their minds later. Google’s done it.
So don’t stake your SEO strategy on it. Third, why not do it right the
first time? In addition to SEO benefits, a canonically clean site should
run faster, present fewer maintenance headaches and place less load on
server and bandwidth resources.
Let’s get canonical!
So, get out there and start cleaning up your site. Canonicalization
fixes are generally simple, have a broad impact and let you fix multiple
SEO problems at once. You’ll get more link authority, deeper site
crawls and better rankings. What’s not to love?
copied, originally posted on Jun 17, 2010 at 1:14pm ET by Ian Lurie
Reading this amazing article was gave me an amazing happiness because the article was one of the best article of this topic.
ReplyDeletebuy facebook live stream viewers