Hreflang declares "this URL is an indexable locale variant". If the target is blocked by robots.txt, blocked by noindex meta, or behind auth, Google can't honour the annotation. The cluster signal weakens or fails entirely. Two outcomes apply per finding: either the page should be indexable (remove the block) or it shouldn't be in the cluster (remove from hreflang config). Both fixes work; the wrong fix is leaving both signals contradictory.
| Block | Effect |
|---|---|
| robots.txt Disallow | Google can't crawl, can't validate hreflang |
| noindex meta | Google crawls but won't index — cluster signals lost |
| X-Robots-Tag header | Same as noindex meta |
| HTTP basic auth | Google sees 401 — can't validate |
| IP allowlist / geo block | Googlebot from US/EU may not reach |
# For each hreflang URL, check if robots.txt allows it
URL="https://example.com/fr/about"
ROBOTS_URL="https://example.com/robots.txt"
python3 -c "
from urllib.robotparser import RobotFileParser
rp = RobotFileParser()
rp.set_url('$ROBOTS_URL')
rp.read()
print('Crawlable:' if rp.can_fetch('Googlebot', '$URL') else 'BLOCKED:', '$URL')
"
# Check meta robots and X-Robots-Tag header curl -sI https://example.com/fr/about | grep -i "x-robots-tag" curl -s https://example.com/fr/about | grep -oE '<meta[^>]*robots[^>]*>' # Both should be absent or set to "index, follow" # If "noindex" appears, target is blocked
Per blocked target, ask: should this URL be indexed by Google?
// BAD: production page references staging 'fr': 'https://staging.example.com/fr/about' // robots-blocked // FIX: update cluster to production URL 'fr': 'https://example.com/fr/about'
# BAD: robots.txt blocks legitimate hreflang targets User-agent: * Disallow: /fr/internal/ # OK Disallow: /fr/ # TOO BROAD — catches everything # FIX: tighten the rule User-agent: * Disallow: /fr/internal/ # Public /fr/* now crawlable
<!-- Some sites add noindex to all pages in a soft-launch locale --> <meta name="robots" content="noindex"> <!-- If launched, remove the noindex --> <!-- Until launched, keep noindex AND remove variant from hreflang config -->
When launching a new locale, the right sequence:
This avoids the window where blocked variants are advertised. Hreflang only references indexable URLs.
// WPML offers "complete translation" workflow. // Posts marked draft/incomplete shouldn't appear in hreflang. // Confirm WPML setting: "Include only complete translations in hreflang"
// Conditionally include locales based on env
const locales = process.env.NODE_ENV === 'production'
? ['en', 'fr', 'de']
: ['en', 'fr', 'de', 'es-beta'];
module.exports = {
i18n: { locales }
};
// Beta locale doesn't appear in production hreflang