A canonical URL is one thing — the URL you want cited and ranked. When the signals disagree (rel=canonical points one way, og:url another, sitemap a third), AI engines and search engines pick arbitrarily and your citations split across URL variants. This guide covers identifying every canonical signal on your pages and unifying them.
1. <link rel="canonical" href="..."> in <head> 2. <meta property="og:url" content="..."> in <head> 3. <url><loc>...</loc></url> in sitemap.xml 4. Internal link href values throughout site 5. Schema.org url property in JSON-LD 6. HTTP Link header in response headers 7. 301 redirects from variant URLs
All seven must agree. One disagreement degrades the signal.
Decide once, document, enforce:
# Canonical URL policy
1. Protocol: https (never http)
2. Host: non-www (example.com, not www.example.com)
— OR www, but pick one and stick
3. Trailing slash: yes on directories (/blog/)
no on pages (/blog/post-1)
— OR consistent the other way
4. Parameters: stripped unless they change content
(?ref=, ?utm_= stripped, ?id= kept)
5. Fragments: stripped (#section never canonical)
6. Case: lowercase (/Products/ → /products/)
<!-- /blog/post-1 -->
<link rel="canonical" href="https://example.com/blog/post-1" />
<meta property="og:url" content="https://example.com/blog/post-1" />
<!-- Schema -->
<script type="application/ld+json">
{
"@type": "Article",
"url": "https://example.com/blog/post-1",
"mainEntityOfPage": "https://example.com/blog/post-1"
}
</script>
<!-- WordPress + Yoast default --> <link rel="canonical" href="https://example.com/post-slug/" /> <meta property="og:url" content="https://example.com/post-slug" /> <!-- Trailing slash mismatch — fix template -->
<!-- Page header --> <link rel="canonical" href="https://example.com/products/widget" /> <!-- sitemap.xml --> <url> <loc>https://example.com/products/widget?utm_source=feed</loc> </url> <!-- Sitemap should match canonical, no tracking params -->
# nginx config return 301 https://www.example.com$request_uri; # But canonical in HTML says: <link rel="canonical" href="https://example.com/..." /> # After redirect, www is canonical per server but non-www per HTML # Result: AI engines cite mix of URLs
<!-- /Products/Widget responds 200 with --> <link rel="canonical" href="https://example.com/products/widget" /> <!-- Good: canonical points to lowercase --> <!-- Better: 301 /Products/Widget → /products/widget -->
<!-- Syndicated copy on partner.com --> <link rel="canonical" href="https://example.com/original-article" /> <meta property="og:url" content="https://example.com/original-article" /> <!-- Tells AI engines: cite example.com, not partner.com --> <!-- partner.com still indexes; ranking authority flows to example.com -->
# Fetch and grep URL="https://example.com/blog/post-1" curl -sI "$URL" | grep -i "^link:" # HTTP Link header curl -s "$URL" | grep -oE 'rel="canonical" href="[^"]+"' curl -s "$URL" | grep -oE 'property="og:url" content="[^"]+"' curl -s "$URL" | grep -A1 '"@type":' | grep -oE '"url":"[^"]+"' curl -s https://example.com/sitemap.xml | grep -A1 "$URL"
Don't fix one page at a time. Patch the layout / theme so every page emits consistent signals:
<!-- Layout template, single source of truth -->
{# Compute canonical from current request, applying policy #}
{% set canon = build_canonical_url(request) %}
<link rel="canonical" href="{{ canon }}" />
<meta property="og:url" content="{{ canon }}" />
<script type="application/ld+json">
{
"@type": "WebPage",
"url": "{{ canon }}"
}
</script>
<!-- Sitemap generator uses same function -->
# Test URL variants all resolve correctly for u in \ "http://example.com/Blog/Post-1/" \ "http://www.example.com/blog/post-1?utm_source=test" \ "https://example.com/blog/post-1/" \ "https://example.com/blog/post-1"; do echo "=== $u ===" curl -sIL "$u" | grep -iE "^(location|link):" curl -sL "$u" | grep -oE 'rel="canonical" href="[^"]+"' done # All should redirect (301) or resolve to the same canonical URL
build_canonical_url() helper used by every template, sitemap generator, and redirect rule. Single source of truth eliminates conflicts permanently and saves hours of per-page audits later.