A canonical tag (or rel=canonical) is a small piece of HTML code that helps search engines to determine the “main” version of the page from the rest of the pages that are identical or very similar to it.
In SEO, canonical tags are used to let Google know which version of the page you want to appear in search results, to consolidate link equity from the duplicate pages as well as to improve crawling and indexing of your website.
Here’s what a canonical tag can look like on the webpage:
<link rel="canonical" href="https://mangools.com/blog/robots-txt/" />
The primary purpose of the canonical tag is to tell search engines which page is the main, original version and which are just duplicates that look the same.
Generally speaking, websites usually contain at least some pages that are considered duplicates – they display the same content but with different URLs.
In these instances, Google has to decide which page to choose for indexing and ranking purposes – it won’t use all the pages as search results since they all look identical or just very similar.
For example, product pages are usually displayed not only by 1 main URL. They can be also displayed with various URL parameters that are often used (e.g. for sorting, currency, sizes, etc.):
https://www.randomshop.com/clothes/shirts.html https://www.randomshop.com/clothes/shirts.html?Size=XL https://www.randomshop.com/clothes/shirts.html?Size=XL&color=red
In this example, the product page can be displayed in the main category –
/clothes/, but also be filtered and displayed with size and color parameters. Therefore it can be displayed as a search result under 3 different URLs.
This is where canonical tags became important – they will indicate to Google that you want to index the main URL category
/clothes/, use it as a search result and ignore the rest of the URLs.
Note: Keep in mind that Google perceives canonical tag as a signal – not as a directive.
If there are valid reasons to choose another page for indexing and ranking purposes rather than the canonical one, the search engine might ignore the canonical tag altogether:
Or as Martin Splitt stated:
“All right, let’s start with the idea that it is a directive because it’s not.”
Besides the fundamental purpose of the canonical tag, there are also some important SEO benefits that come with it.
Canonical tags help to consolidate link equity (PageRank) from all duplicate pages into the one main, canonical page.
Duplicate pages can often obtain backlinks from various external sources – whether they are backlinks from random websites, users on social media, etc.
These pages therefore partially take over the link equity from the main version of the page – the one that you actually want to rank as a search result.
By implementing canonical tags on the duplicate pages, PageRank can be transferred into a single URL and therefore improve its overall ranking in Google Search.
Canonical tags can tell the search engine which website contains the original version of the content and which websites just republish it (or syndicate).
Many website owners use other websites for publishing their content (either for promotional or other purposes).
In this case, Google has to decide which website is the original source of this content and should be displayed as a search result and which websites just promote it.
Setting up canonical tags on these external websites helps to resolve this problem and promote the original, main version of the page in Google Search.
Or as Danny Sullivan stated:
Canonical tags help search engines like Google to efficiently crawl pages that you actually want to crawl and index – as opposed to duplicates that should not be crawled at all.
Duplicate pages waste Google’s resources and time as they are not important for crawling or indexing purposes.
By appointing canonical pages, Google will focus more on pages that matter the most and therefore save the “crawl budget”.
Or as Google officially stated:
“The canonical page will be crawled most regularly; duplicates are crawled less frequently in order to reduce Google crawling load on your site.”
Adding canonical tags to your pages is pretty easy – just go to any duplicate webpage and add rel=”canonical” tag into the <head> section of the page.
The link in the canonical tag should be pointing into the main, original version.
Implementing canonical tags is best done on a page-by-page basis. However, this can consume a lot of time and resources or be even impossible on larger websites.
Fortunately, canonical tags can be also implemented automatically by using various plugins such as Yoast SEO (for WordPress).
Implementation of canonical tags via this plugin is pretty straightforward:
There are also a few other ways how you can indicate to Google your canonical pages.
Canonical tags can be also added in the HTTP header of the webpage.
This is especially useful for special non-HTML documents such as PDFs – since they don’t contain any
<head> section where you could add a standard canonical tag.
For implementing canonical tags into the HTTP header, you need to access the
.htaccess file of your site and add the canonical tag in to form that can look like this:
Link: <https://www.yoursite.com/random-document.pdf>; rel="canonical"
If you would like to learn more about adding canonical tags via HTTP header, check out this article about the implementation of canonicals.
Tip: There are also a few other ways how you can tell the search engine about pages that you wish to be canonical versions:
Although it is not mandatory, it is always a good practice to add a canonical tag on a page that points to itself – even if you did not use canonical tags on the rest of the duplicate pages.
rel=canonical on the main, original pages gives search engines like Google a clear signal that they are canonical versions:
“I recommend doing this kind of self-referential rel=canonical because it really makes it clear for us which page you want to have indexed or what this URL should be when it’s indexed.” (John Mueller).
Absolute URLs in canonical tags can help you avoid unintentional mistakes or bad interpretation of canonical URLs by a search engine (as opposed to the relative URLs).
Absolute URLs should also include
www, and trailing slashes (if possible).
Here is an example of the absolute URL in canonical tag:
<link rel="canonical" href="https://www.randomwebsite.com/randompage/" />
And here is an example of just relative URL:
<link rel="canonical" href="/randompage/" />
Search engines like Google can be sensitive about the upper and lower cases in the URLs.
Using lower cases in canonical URLs can therefore help you keep consistency and avoid duplication issues in the eyes of search engines.
As a good practice, try to use lower case in URLs on your servers as well as apply them to the canonical tags.
Canonical tags can also reference your main pages from other domains – not just from your website.
If you have duplicate content present on pages on a different website (e.g. repurposed post on some news site), you should:
Pay attention to the multiple canonical tags that might occur in the HTML of a page by accident.
Although rare, having more than 1 canonical tag on a page can create confusion for the search engine and result in ignoring this canonical signal.
Or as Google officially stated:
“In cases of multiple declarations of rel=canonical, Google will likely ignore all the rel=canonical hints. Any benefit that a legitimate rel=canonical might have offered will be lost.”
Always make sure that the content on the duplicate pages and the main version of the page is either identical or at least nearly similar when applying canonical tags.
Implementing canonical tags on pages that are completely different might confuse search engines or be completely ignored:
Or as Martin Splitt explained:
“… if the content is completely different or different enough for the algorithms to decide that this is not a duplication, then the canonical is pointless.”
Paginated pages contain fragmented content across several different pages (e.g. comment section on the website divided into pages “1”, “2”, “3”).
In this instance, you should always use self-referencing canonical tags on every individual page – and not refer to page “1” from the rest of the paginated pages:
“The main thing to avoid, since this post is about canonicalization, is to use the rel=canonical on page 2 pointing to page 1. Page 2 isn’t equivalent to page 1, so the rel=canonical like that would be incorrect.” (John Mueller)
You should never block URLs with canonical tags by robots.txt file.
Robots.txt will prevent Google from crawling the duplicate pages – therefore it will be unable to see the canonical tag referencing the main version of the page.
Furthermore, blocking URLs that contain canonical tags will also prevent PageRank to be transferred into your main versions.
Canonical tags should be always applied in the
<head> section of your pages – not in any other places in the HTML document.
Google will simply ignore your canonical tags in the
<body> section or in any other place.
You should always try to use canonical tags referencing directly to the main page in order to avoid canonical loops (similar to the redirect loops).
For example, using a canonical tag from page A to page B and then from page B to page C will create a canonical chain that can confuse search engines and waste their resources and time.
This post was last modified on January 21, 2022 7:00 pm