How to Fix Duplicate Content
Duplicate content splits ranking signals between URLs that should be one. Google chooses one to rank and buries the others. The fix depends on the duplicate type: exact duplicates from CMS quirks need canonical tags, accidental duplicates need 301s, intentional near-duplicates need merging or rewriting. This guide walks through the full triage and the platform-specific patterns that prevent recurrence.
1. Generate the duplicates report
Step 1
Run the Site Crawler with similarity scoring
Run a fresh
Site Crawler with the option
Detect near-duplicates enabled. The crawler scores every page pair for content similarity and groups pages above the threshold (default 80%) into clusters. Export the cluster CSV.
Step 2
Sort by similarity
Highest-similarity clusters (95-100%) are exact duplicates from CMS quirks — fix first. 80-95% clusters are near-duplicates that need editorial decisions. Below 80%, generally fine unless intent overlaps significantly.
2. Classify each cluster
Type A: Same URL, different parameters
Example: /products/widget, /products/widget?utm_source=email, /products/widget?sort=price. The content is identical; only tracking or sort params differ.
Fix: Self-referencing canonical on the primary URL, robots.txt or Search Console parameter handling for query strings that shouldn't be indexed.
Type B: Multiple URLs, same content
Example: /blog/post-title and /posts/post-title serving identical content because the CMS exposes the post under two routes.
Fix: 301 redirect from the secondary to the primary. Update internal links.
Type C: Near-duplicates competing for the same query
Example: /best-running-shoes-2025 and /top-running-shoes-uk with 70% similar content targeting overlapping search intent.
Fix: Editorial decision — merge into one comprehensive page, differentiate intent clearly, or rewrite the lower-performer with a different angle.
Type D: Variant pages (intentional duplicates)
Example: Multiple language versions, mobile/AMP variants, country-specific pages.
Fix: hreflang tags for language/country variants, AMP-specific canonicals, never 301 these.
3. Apply canonical tags
Step 1
Add self-referencing canonical to chosen primary
In the
<head> of the primary URL:
<link rel="canonical" href="https://yourdomain.com/products/widget">
Self-referencing canonicals are best practice on every indexable page. They prevent accidental duplication from tracking parameters and similar URL variants.
Step 2
Add cross-referencing canonical on duplicates
If duplicate URLs must stay accessible (e.g. parameter URLs needed for filtering), add canonical on each duplicate pointing at the primary:
<!-- on /products/widget?utm_source=email -->
<link rel="canonical" href="https://yourdomain.com/products/widget">
⚠️ Canonical tags are hints, not directives. Google may ignore them if the suggested canonical isn't actually similar to the canonicalising page. For non-negotiable consolidation, use 301.
4. Apply 301 redirects
Step 1
Decide the winner per duplicate cluster
For each cluster, the winner is the URL with the most external backlinks (check Search Console Links report) or the most current Search Console clicks. If both are similar, choose the cleaner URL or the one matching your current URL conventions.
Step 2
301 the losers to the winner
Apply 301 via your platform — Redirection plugin (WordPress), URL Redirects (Shopify), nginx rewrite, Apache Redirect 301. See the
redirect chains guide for syntax per platform.
Step 3
Update internal links
Internal links still pointing at the loser URL now go through a 301 hop. Update them to point directly at the winner. Use bulk find-and-replace techniques from the
broken links guide.
5. Use noindex for pages that must exist but shouldn't rank
Some pages are duplicates by design — search-result pages, faceted-nav variants, tag archives, paginated lists. They need to exist for users but shouldn't compete in the index.
Step 1
In the
<head> of these pages:
<meta name="robots" content="noindex, follow">
follow lets Google still discover links from these pages;
noindex keeps them out of search results.
6. Platform-specific recurrence prevention
WordPress
- Yoast / Rank Math: enable canonical auto-generation for posts and products
- Disable tag archives and date archives if you don't actively use them
- Set
noindex on author pages unless authors are part of your E-E-A-T strategy
Shopify
- Products auto-canonicalise to
/products/{handle} regardless of which collection link the visitor came through
- Avoid smart collections that overlap heavily with manual collections
- For variants: use product-level canonical, not variant-level URLs in nav
Headless / custom CMS
- Build canonical tag generation into your routing layer, not per-template
- Strip tracking parameters server-side or via JavaScript to standardise URL display
- Add CI checks: any new template must emit a canonical tag
7. Re-run the audit
After consolidating, run the Site Crawler again. Duplicate clusters should drop significantly. Remaining clusters are usually intentional variants (language, AMP) or pages still being merged editorially.
🕷 Re-run the Site Crawler
Verify duplicate count has dropped after consolidation.
Run Site Crawler →