Why must schema URLs be absolute?

Schema is processed out-of-context by AI engines and Google. Relative URLs (/products/widget) have no resolution context — the parser doesn't always know which domain. Absolute URLs (https://example.com/products/widget) work unambiguously regardless of where the schema is fetched, syndicated, or cached.

What URL fields need fixing?

url, mainEntityOfPage, image, logo, sameAs, contentUrl, embedUrl, offers.url, author.url, publisher.url, and any property typed as URL or ImageObject. AI generators sometimes output these as relative, sometimes as full but pointing at staging domain.

My image URL points at S3 — is that OK?

Yes, third-party image hosts are fine as long as the URL is absolute, resolves 200, and the image is publicly accessible (no auth required). CDN URLs, S3 buckets, image proxies all work. Just make sure the URL is stable — AI engines cache and reference it.

Should schema url match rel=canonical?

Yes — both should point at the same canonical URL. Mismatch signals conflict. See the canonical conflicts fix for the full pattern of unifying canonical signals across rel=canonical, og:url, sitemap, and schema.

How to Fix AI Schema URL Errors

AI schema generators sometimes output relative URLs, broken targets, or URLs from staging domains. These break rich-result eligibility and confuse AI engines that follow URLs to fetch related entities. The fix: validate every URL field, ensure all are absolute and resolvable, and align with your canonical URLs.

1. URL fields in schema

Every schema type has URL-typed properties. Common ones:

"@id"                  → unique identifier (URL form)
"url"                  → primary URL of the entity
"mainEntityOfPage"     → page where entity is primarily described
"image"                → image URL or ImageObject
"logo"                 → organisation logo URL
"sameAs"               → array of external profile URLs
"author.url"           → author's profile page
"publisher.url"        → publisher homepage
"offers.url"           → buy/checkout URL
"contentUrl"           → direct file URL (for MediaObject)
"embedUrl"             → iframe-embeddable URL (for VideoObject)
"telephone"            → not URL but treated similarly

2. Bad patterns AI generators produce

Pattern 1: Relative URLs

{
  "@type": "Product",
  "name": "Widget",
  "image": "/images/widget.jpg",          ❌ relative
  "url": "/products/widget",              ❌ relative
  "offers": {
    "url": "/products/widget#buy"         ❌ relative
  }
}

Pattern 2: Staging or wrong-environment URLs

{
  "url": "https://staging.example.com/products/widget"   ❌ staging
}

{
  "image": "http://localhost:3000/images/widget.jpg"     ❌ localhost
}

Pattern 3: Protocol-relative URLs (legacy)

{
  "image": "//cdn.example.com/widget.jpg"   ❌ protocol-relative
}
<!-- Worked in browsers; fails in schema parsers -->

Pattern 4: URLs with tracking params

{
  "url": "https://example.com/products/widget?utm_source=schema"   ❌ tracked
}
<!-- Schema URL should be canonical, no UTM -->

Pattern 5: Broken or redirected URLs

{
  "url": "https://example.com/old-product-url"    ❌ 301 redirect
}
{
  "image": "https://example.com/deleted-image.jpg"   ❌ 404
}

3. The correct pattern

{
  "@context": "https://schema.org",
  "@type": "Product",
  "@id": "https://example.com/products/widget#product",
  "name": "Blue Widget",
  "url": "https://example.com/products/widget",
  "image": [
    "https://example.com/images/widget-1200x630.jpg",
    "https://example.com/images/widget-1080x1080.jpg"
  ],
  "mainEntityOfPage": "https://example.com/products/widget",
  "offers": {
    "@type": "Offer",
    "url": "https://example.com/products/widget#buy",
    "price": "29.00",
    "priceCurrency": "GBP"
  },
  "brand": {
    "@type": "Brand",
    "name": "Acme",
    "url": "https://example.com/brand/acme"
  }
}

All URLs absolute, HTTPS, canonical (no UTM), pointing at the live domain.

4. Fix the generator config

If your AI schema generator outputs the wrong URLs, the fix is usually at config-time, not post-edit:

// Pass base URL to generator
const schema = await generateSchema({
  content: pageContent,
  baseUrl: 'https://example.com',  // ← always set
  currentUrl: req.canonicalUrl,    // ← the page being annotated
  environment: 'production'        // ← strip staging URLs
});

// Generator can then resolve all relative URLs from baseUrl
// and validate that produced URLs match production domain

5. Post-validation pass

// Run this on every generator output before deploy
function validateSchemaUrls(schema, expectedDomain) {
  const errors = [];
  
  function walk(obj, path = '') {
    if (typeof obj !== 'object' || obj === null) return;
    for (const [key, value] of Object.entries(obj)) {
      const p = path ? `${path}.${key}` : key;
      if (typeof value === 'string' && looksLikeUrlField(key)) {
        // Must be absolute
        if (!value.startsWith('http://') && !value.startsWith('https://')) {
          errors.push(`${p}: relative URL "${value}"`);
        }
        // Must be HTTPS
        if (value.startsWith('http://')) {
          errors.push(`${p}: HTTP not HTTPS "${value}"`);
        }
        // Must be production domain
        if (!value.includes(expectedDomain)) {
          errors.push(`${p}: wrong domain "${value}"`);
        }
      } else if (Array.isArray(value) || typeof value === 'object') {
        walk(value, p);
      }
    }
  }
  
  walk(schema);
  return errors;
}

const URL_FIELDS = ['url', 'image', 'logo', 'sameAs', 'contentUrl', 'embedUrl', '@id'];
function looksLikeUrlField(key) {
  return URL_FIELDS.includes(key) || key.endsWith('Url');
}

6. URL resolution check

Step 1

Extract URLs from schema

# From a deployed page
curl -s https://example.com/products/widget | \
  python3 -c "
import json, re, sys
html = sys.stdin.read()
for m in re.finditer(r'<script type=\"application/ld\+json\">(.+?)</script>', html, re.DOTALL):
    data = json.loads(m.group(1))
    # Walk and print every URL value
    def walk(o, path=''):
        if isinstance(o, dict):
            for k,v in o.items():
                walk(v, f'{path}.{k}')
        elif isinstance(o, list):
            for i,v in enumerate(o):
                walk(v, f'{path}[{i}]')
        elif isinstance(o, str) and o.startswith('http'):
            print(f'{path}: {o}')
    walk(data)
"

Step 2

Test each URL responds 200

# Extract URLs, curl each, report non-200
while read url; do
  code=$(curl -s -o /dev/null -w "%{http_code}" "$url")
  [ "$code" != "200" ] && echo "BAD $code $url"
done < schema-urls.txt

7. CI integration

Make URL validation part of your build. Failing schema URLs should block deploy:

# .github/workflows/schema-check.yml
- name: Validate schema URLs
  run: |
    npm run build
    node scripts/extract-schema-urls.js > urls.txt
    while read url; do
      code=$(curl -sIo /dev/null -w "%{http_code}" "$url")
      if [ "$code" != "200" ]; then
        echo "::error::Schema URL broken: $code $url"
        exit 1
      fi
    done < urls.txt

💡 If you fix only one type of URL bug, fix relative URLs. They cause the most silent failures: schema validates against schema.org, looks fine in the rich results test, but AI engines and Google can't resolve them so the entity never builds correctly. Convert every URL field to absolute and most schema URL issues disappear.

🤖 Re-run AI Schema Generator

Validate URLs in fresh output.

Run AI Schema Generator →