Pages

Categories

Archives

Sponsors



RSS Feeds

Here's a feed for our sister site: Professors Coding Corner

September 20, 2009

For years there has been a lot of debate and controversy over the issue of duplicate content.

  • Is there a penalty imposed by the search engines for having material on your website that is identical to material on another website?

  • If so, how much of the content can be duplicated without penalty?

  • What is the penalty?

Five days ago, Google addressed this issue by posting a video on their Webmaster Central Blog by Greg Grothaus entitled Duplicate content and multiple site issues. Greg dispels the “Duplicate Content Penalty Myth” by essentially repeating what Google has been saying for years: there is no penalty.

( Here is a link to an eBook written over a year ago, The Duplicate Content Debate, which has a good discussion of the issues. )

Buzzword Bingo: Duplicate Content, licenses/by-sa/2.0/

Mr. Grothaus goes on to discuss examples of legitimate duplicate content, and explains that Google will try to choose the best version of the content, and display that for each search engine query. It rarely happens, but if Google decides that duplicate content is being used to manipulate rankings, it will “make appropriate adjustments”. However this is not a penalty for duplicate content — it’s a penalty for spam.

There are several issues ( not penalties ) that may result from duplicate content, even if it IS legitimate:

  • Link popularity may become diluted by having backlinks point to several versions of the content.

  • Google may inadvertently pick the wrong content to use in search results.

  • The extra time taken to crawl the duplicate content means that there is less time available for Google to discover new content.

To prevent such issues, there is a new link element available to tell Google which of several webpages is the one you want indexed. The canonical tag may be used on duplicate pages to point to the page that you want Google to index. Using the example from the video, suppose that you want Google to ignore this URL:

http://www.example.com/page.html?sid=asdf314159265

Then you would put the following in the HEAD section of that page ( IMPORTANT – This means it goes on the page(s) that you DON’T want Google to index! ):

<link rel="canonical" href="http://example.com/page.html" />

So it seems that the debate over duplicate content has now been laid to rest.

Conclusion: If you’re not trying to be “black hat”, don’t worry about it — but let Google know which of your pages you want them to index.

Update – November 14, 2014: I recently ran across a page in Google Webmaster Tools Help that contradicts my “IMPORTANT” comment above. “Mark up the canonical page and any other variants with a rel=’canonical’ link element.” Greg Grothaus evidently changed his mind, and decided that you could also put the link on the page itself.


3 Comments »

  1. Excellent post on explanation of duplicate content problem. Thanks.

    Comment by M. Jamil — August 14, 2010 @ 6:08 am

  2. duplicate content issue is pain but it comes default with wordpress pagination comments and tag pages, so it is nice to hear from Google that there is no penalty for such issues as they’ll try to choose the best version of the content for SERPs. But I would like to know if there is any WordPress plugin that can fix this duplicate content issue putting rel=canonical tag in the head section as you have suggested here. If there’s one such plugin that works with WordPress 3.0.5 then plz post it here. Thanks

    Comment by bobby — February 23, 2011 @ 12:09 pm

  3. Since noone else has answered your query, I will do it myself. The plugin, “SEO No Duplicate”, might do the job. However, it has only been tested for compatibility up to WP 2.9.2.

    Comment by Professor — December 16, 2011 @ 12:53 am

RSS feed for comments on this post.

Leave a comment

This blog is kept spam free by WP-SpamFree.

*