Announcement:

BLOGGER HEROE IS UNDER CONSTRUCTION

Find Duplicate Content Issues on Your Website and Fix Them

It may not me that obvious for the general Internet users, but it’s quite common for websites to make same content accessible via different URLs. The search engines refer to this as duplicate content, and they don’t like it – it’s the same content that appears on more than one place on the Internet so it’s understandable.
There are different types of duplicate content. For example when someone steals your web copy or articles and publishes them on another website, he/she is creating duplicates. You can fight these content stealers using duplicate content checker PlagSpotter to automatically scan and also closely monitor your pages in the future. But here we’ll talk about a different type of duplicate content – the duplicate content that’s on your own website, excluding ecommerce websites which are pretty specific when considering these issues.
duplicate content,copy articles

Duplicate Content and canonical URLs

Google says “Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin”.
And here’s what they say on canonicalization: “To gain more control over how your URLs appear in search results, and to consolidate properties, such as link popularity, we recommend that you pick a canonical (preferred) URL as the preferred version of the page. You can indicate your preference to Google in a number of ways. We recommend them all, though none of them is required (if you don't indicate a canonical URL, we'll identify what we think is the best version).”

So how this all applies to your website?

Your website has potential duplicate content issues if your pages are accessible through different URLs. If you can access the About Page on your website from http://yourwebsite.com/AboutPage, http://yourwebsite.com/aboutpage, http://yourwebsite.com/aboutpage.html, etc. you can implement 301 redirects to switch users to the version you prefer, or tell Google which version to index and prevent the issues from affecting your whole domain. Google cannot know how do you prefer your URLs if you don’t provide them with the information; do you want the links to include the ‘www’ part, or a slash at the end, the name of the file, uppercase or lowercase letters, etc.
Sometimes the duplicate content problems also happen because our CMS creates different dynamic URLs and adds them to the initial ones. If the search engines aren’t told differently they will index them as individual pages though they actually are not.

Are there duplicate content problems on your website?

You can easily find out whether there’s duplicate content on your website, and here are few ways to do so:
  • To check these issues on a domain level you can use the free tool Redirect Check, or you can Google search a specific page on your site. If the results show more than one link you should look into them to find out why. To check how a certain type of file is indexed by the search engines you can type in site:yourwebsite.com filetype:pdf (or any other you want to look up).
  • Check your sitemap to see if you are telling search engines to index multiple URLs for same pages on your website. You should only include pages you want search engines to know about in your sitemap. But due to issues with dynamic URL parameters and preferred domains they often index more webpages. You can do a quick check by simply Googling site: yourwebsite.com inurl:the dynamic URL and see if Google has indexed it.
  • Examine the URLs that already send organic traffic to your website. Check where your visitors come from logging into your Analytics account and write down all the URLs you see but don’t want the engines to index to fix them later.

Free tools for quick and easy solutions

To start solving the problem you must first understand it; to find out the exact duplicate content issues your website is facing you should get a list of all the pages that Google has indexed on your website. You can either use The Screaming Frog SEO Spider tool, or follow these instructions:
  • Download and enable the SEO Quake browser plugin.
  • Turn off Instant results in Google Preferences and set the Results per page to 100.
  • Search site:yourwebsite.com on Google. The results will only show pages from your website.
  • Check the information and icons under the search box and click Show as CSV or Save to download all indexed web pages.
  • Open the file in Excel and sort out the URLs you want and don’t want in the search results.

Now, let’s fix the issues.

You should first decide what domain URL you prefer and set it in the Webmaster Tools. Then you should redirect all other versions to the preferred one with 301 redirect pages, or if you aren’t certain ask your hosting provider to assist you.

To remove the web pages you previously selected in your Excel sheet log into your Google Webmaster Tools account and go to Optimization/Remove URLs. Enter the URLs you want to remove, select “Remove page from search result and cache” and submit the request. Repeat this for all web pages you don’t want to see in the search results. On this page you can also track the status of each removal request to know when the submitted URLs have been deleted.

You can set Google to continue ignoring certain dynamic pages using parameter handling. Check the URL Parameters Google has already picked up for you and edit the options where needed. In addition to this, also implement canonicalization on your website to make sure Google understands which one is the correct version. Unlike Joomla, WordPress users have the canonical tag automatically integrated, but those of you who want to have full control over the canonical tag for individual posts can try out sh404SEF or RSSeo for Joomla, or for Yoast’s SEO plugin WordPress.
Write For Us Guest Author
About The Author
Laura-May Zvolinska is Project Coordinator of duplicate content checker PlagSpotter, you may contact her at Twitter or G+.
Share it Please

6 comments:

  1. Plag Spotter is really funny..I have tried with my blog address and it is showing that Blogger.com are copying my article. People are connecting with blogger.com for spreading their article. Plag Spotter is useless.

    ReplyDelete
  2. It is meaningless to use duplicate content on your website, because it will create a negative impact in the eyes of common readers. Furthermore, the giant search engine, Google is much concerned about originality of content, and it will be against its set rules to plagiarize while writing web content.

    ReplyDelete
  3. I appreciate your work. This information is really cool and lot informative. Keep this work up and make us knowledgeable. express-link-building

    ReplyDelete
  4. I just came across your blog and reading your beautiful words. I thought I would leave my first comment but I don't know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.Premium WordPress Plugin's soup

    ReplyDelete

Your feedback is always appreciated. We will try to reply to your queries as soon as time allows.

Note:
1. To add HTML code in comments then please use our HTML Encoder
2. For perfect customization of our tutorials, use our HTML Editor
3. Please do not spam, Spam comments will be deleted immediately upon our review.

Regards,
Adrian Lucernas

Copyright @ 2013 BLOGGER HEROE. Designed by BloggerHeroe | Love for Blogger Heroe