What exactly has Google said about Duplicate Content?

The issue of duplicate content has continually been a topic of discussion among SEO content writers

While some pose it as a myth, others maintain its significance to search rankings. Today, the question is, “What exactly is the deal with Google and duplicate content?”

We are happy to say that in this post, we will treat this issue once and for all. DailyPosts will answer all your questions and overcome any objections regarding the topic. After reading this article, we guarantee, you will never need to ask about duplicate content again.

What is duplicate content?

For clarity, it is important to start by defining the meaning of duplicate content. According to Google, duplicate content is the term used to describe considerable blocks of content within and across domains that are an exact match or substantially similar to each other.

Contrary to popular opinion, this type of content does not always seek to be deceptive in nature. Examples of harmless duplicate content may include:

  • Store items shown or linked though several separate URLs
  • Discussion forums that produce both lean and regular pages aimed at mobile devices
  • Versions of web-pages that are printer-only

Sometimes, it may even be a quote copied from your favourite blog post with no intended malicious purpose. No matter how hard you try to achieve a 100% unique content, it is almost impossible if you use secondary research materials.

Are you worried about what this could mean for your site?

Here is what Google had to say as recently as September 2017 (in paraphrase):

“The lowest rating is obtainable if all, or almost all, of the main content on the page is reproduced with little or no expertise, manual curation, effort or value is added for the users. Such pages should attract the lowest possible rating, even if credit has been assigned for the content to another source.”

However, it is important to understand the difference between “duplicate content” and “copied content”. While the former is usually non-malicious (as explained), copied content has been deliberately lifted to be re-used for deceptive reasons.

Putting semantics aside, there is evidently a different treatment for ‘duplicate content’. Conversely, copied content will often be penalised manually or algorithmically by Panda. While duplicate content will not be penalised manually it is often not an optimal setting for pages. It is therefore important to be careful while you ‘spin’ words to make them unique.

The confusion with duplicate content

Although Google has admitted that it does not penalise websites for duplicate content, webmasters and copywriters are still concerned about the perception of Google search engines on their duplicate content, and how it still affects their rankings in a negative way.

Can an algorithm tell the difference between malicious and harmless duplicate content? As a matter of fact, if Google regards ‘malicious’ duplicate content in three categories:

  • Thin content
  • Manipulative boiler plate
  • Near duplicate spun content

Then it means your content violates Google’s requirements for unique content and must be cleaned up effective immediately. This is important if you must perform better in search rankings.

According to Google, in a recent announcement, we must all understand that near duplicate ‘spun’ or manipulative boiler-plate content is different from ‘duplicate content’. In the statement, a case was made for copied content which has been changed from the original.

Where content has been copied with slight variations from the original, this makes it harder to determine the exact matching original source. Such content may have a few words substituted for another in the text, or specific sentences changed. In some cases, a “seek and replace” amendment has been made. This kind of content, according to Google, is called, “Copied with minimal alteration”.

Simply put by John Mueller, Google’s senior engineer, “We do not have a duplicate content penalty. However, some things around duplicate content may be worthy of a penalty.”

Boilerplate content; another cause for concern

We mentioned boilerplate content previously, but didn’t explain, so you might be wondering how it fits into this whole concept.

Boilerplate content is simply content that is available on different parts or web pages on your site. According to Annie Smarty, it can be often found in:

  • Some specific areas including links such as blogroll and navigation bar
  • Global navigation such as ‘about us’, ‘home; and so on
  • Markup (CC id/class names like footer, header, javascript)

A standard site often has a header, sidebar and a footer. In addition to these features, most content management systems let you display your most recent posts or the most popular ones on the homepage too. When Google search algorithms crawl your site, they index them separately, so they turn out to be duplicate content.But this sort of duplicate content does not impact your SEO negatively. Search engine algorithms are designed to recognise them do you are safe.

However, if this boilerplate content is very common, and forms part of the primary content of several pages on your site, Google can easily sieve through to ignore- or penalise- your site depending on the perceived intentions.

It is therefore smart to heed Google’s warnings to reduce– or at least spread- the occurrence of boilerplate content, from page to page on your website.

The danger of boilerplate content

Be aware that thin content worsens spun boilerplate text issues on a site, because it (thin content) continues to create more pages that can only be produced with boiler plate text.

For example;

If a sales item has 15 URLs- one for a different design or colour- the title, meta description and product description will usually rely on boilerplate techniques to spin its content, therefor creating more duplicate content across the pages. Copywriters should take note of this.

John Mueller emphasises that the practice of making your content ‘more unique’ with low-quality methods is counterproductive to your website’s search rankings. A website with too many similar pages will give Google algorithms a problem of which page to rank the most. It may end up ranking the non-primary page as the highest. To prevent this, you can use canonical tags. We will discuss canonicalization further in this post.

 How can you prevent duplicate content from harming your site?

Fixing your duplicate content issues all boils down to one common idea; helping Google algorithms determine which content is the “primary” one.

Whenever content on a site has several URLs, it should be canonicalized for SEO purposes. You can do this in various ways;

  1. Using a 301 redirect

One of the best ways to correct a duplicate content issue is to configure a 301 redirect so that people or bots landing on the ‘duplicate’ page will be re-directed to the original content page.

When several pages that have the potential to rank favourably are merged into a single page, they will not only stop competing among themselves, they will also become more relevant and have a stronger popularity signal. This instantly boosts the primary page’s ability to rank highly on search engine result pages (SERPs).

  1. Rel= “Canonical”

The rel=canonical feature or canonicalization is another option that helps you deal appropriately with ‘duplicate content’. It tells the search engine which page should be given the priority of the original page. The page will be treated as if it were a copy of a specific URL, and all the links, “ranking strength”, and content metrics that search engines apply to this page should be given to the specified URL.

The rel=’canonical’ feature is a section of the HTML head of a webpage, and it should be added to the HTML head of each duplicate version of a page. The original page’s URL segment above is to be replaced by a link to the primary (canonical) page.

  1. Tool for Parameter handling in Google Search Console

With Google Search Console, you can configure the preferred domain of your website. That is instead of https://www.mysite.com, you can choose https://mysite.com, and indicate if the Google search algorithm should index other URL parameters separately (this is known as parameter handling).

Depending on the URL structure of your webpage, and the origin of your duplicate content problems, you may be able to solve the whole issue by simply configuring your desired domain.

Additional ways to manage your duplicate content issues

  • Be consistent when linking internally throughout your website
  • Why syndicating a content, ensure that the website being syndicated includes a link pointing back to the primary content, and not a different version on the URL.
  • For extra protection against content pickers who copy your content for SEO benefits, add a self-pointing rel=canonical link that directs back to your current pages.
  • Understand your content management system; if you are unaware of how content is displayed on your site, you may not know when they are being presented in multiple formats. This should be one of the first things you check out with your CMS.

To recap the discussion so far. The latest advice from Google is:

  1. There is no penalty for duplicate content (however, search algorithms may impact such sites)
  2. Google rewards websites that are highly unique, and whose signals are representative of added value
  3. Google filters out duplicated content
  4. Duplicate content can affect the speed at which Google finds new content
  5. Reduce boilerplate content
  6. Duplicate content won’t necessarily harm your website
  7. You can control duplicate content by using 301 redirects, canonicalization, and Google search console

From now, as you go about content creation and SEO, you should be able to tell which strategies are in the best interests of your site’s search ranking.

 

 

 

Shares