/ Site Crawler Fixes / Duplicate Content

How to Fix Duplicate Content

Duplicate content splits ranking signals between URLs that should be one. Google chooses one to rank and buries the others. The fix depends on the duplicate type: exact duplicates from CMS quirks need canonical tags, accidental duplicates need 301s, intentional near-duplicates need merging or rewriting. This guide walks through the full triage and the platform-specific patterns that prevent recurrence.

1. Generate the duplicates report

Step 1
Run the Site Crawler with similarity scoring
Run a fresh Site Crawler with the option Detect near-duplicates enabled. The crawler scores every page pair for content similarity and groups pages above the threshold (default 80%) into clusters. Export the cluster CSV.
Step 2
Sort by similarity
Highest-similarity clusters (95-100%) are exact duplicates from CMS quirks — fix first. 80-95% clusters are near-duplicates that need editorial decisions. Below 80%, generally fine unless intent overlaps significantly.

2. Classify each cluster

Type A: Same URL, different parameters

Example: /products/widget, /products/widget?utm_source=email, /products/widget?sort=price. The content is identical; only tracking or sort params differ.

Fix: Self-referencing canonical on the primary URL, robots.txt or Search Console parameter handling for query strings that shouldn't be indexed.

Type B: Multiple URLs, same content

Example: /blog/post-title and /posts/post-title serving identical content because the CMS exposes the post under two routes.

Fix: 301 redirect from the secondary to the primary. Update internal links.

Type C: Near-duplicates competing for the same query

Example: /best-running-shoes-2025 and /top-running-shoes-uk with 70% similar content targeting overlapping search intent.

Fix: Editorial decision — merge into one comprehensive page, differentiate intent clearly, or rewrite the lower-performer with a different angle.

Type D: Variant pages (intentional duplicates)

Example: Multiple language versions, mobile/AMP variants, country-specific pages.

Fix: hreflang tags for language/country variants, AMP-specific canonicals, never 301 these.

3. Apply canonical tags

Step 1
Add self-referencing canonical to chosen primary
In the <head> of the primary URL:
<link rel="canonical" href="https://yourdomain.com/products/widget">
Self-referencing canonicals are best practice on every indexable page. They prevent accidental duplication from tracking parameters and similar URL variants.
Step 2
Add cross-referencing canonical on duplicates
If duplicate URLs must stay accessible (e.g. parameter URLs needed for filtering), add canonical on each duplicate pointing at the primary:
<!-- on /products/widget?utm_source=email -->
<link rel="canonical" href="https://yourdomain.com/products/widget">
⚠️ Canonical tags are hints, not directives. Google may ignore them if the suggested canonical isn't actually similar to the canonicalising page. For non-negotiable consolidation, use 301.

4. Apply 301 redirects

Step 1
Decide the winner per duplicate cluster
For each cluster, the winner is the URL with the most external backlinks (check Search Console Links report) or the most current Search Console clicks. If both are similar, choose the cleaner URL or the one matching your current URL conventions.
Step 2
301 the losers to the winner
Apply 301 via your platform — Redirection plugin (WordPress), URL Redirects (Shopify), nginx rewrite, Apache Redirect 301. See the redirect chains guide for syntax per platform.
Step 3
Update internal links
Internal links still pointing at the loser URL now go through a 301 hop. Update them to point directly at the winner. Use bulk find-and-replace techniques from the broken links guide.

5. Use noindex for pages that must exist but shouldn't rank

Some pages are duplicates by design — search-result pages, faceted-nav variants, tag archives, paginated lists. They need to exist for users but shouldn't compete in the index.

Step 1
Add noindex meta
In the <head> of these pages:
<meta name="robots" content="noindex, follow">
follow lets Google still discover links from these pages; noindex keeps them out of search results.

6. Platform-specific recurrence prevention

WordPress

Shopify

Headless / custom CMS

7. Re-run the audit

After consolidating, run the Site Crawler again. Duplicate clusters should drop significantly. Remaining clusters are usually intentional variants (language, AMP) or pages still being merged editorially.

🕷 Re-run the Site Crawler

Verify duplicate count has dropped after consolidation.

Run Site Crawler →
Related Guides: Site Crawler Fixes  ·  Fix Thin Content  ·  Fix Canonicals  ·  Site Crawler Guide
💬 Got a problem?