/ Learning Hub / Robots & Sitemap Fixes

How to Fix Every Robots & Sitemap Finding

The Robots & Sitemap Tester validates your robots.txt and XML sitemaps β€” the two files that control how search engines crawl your site. Misconfigurations here silently block indexing of entire sections. A single line in robots.txt can deindex thousands of pages. This index covers every finding the tester raises.

New here?
Start with the Robots & Sitemap Guide for the protocols, or the example report.

By finding type

Findings fall into these categories. Pick yours:

🚫 Fix accidental robots.txt blocks PLANNED
Disallow: / blocks the entire site. Staging environments left in production. Disallow: /wp-admin/ patterns that accidentally include public content. How to audit before disaster.
⚠️ Fix robots.txt syntax errors PLANNED
Wildcards in the wrong place, User-agent without trailing colon, comments mixed with directives. Google's robots.txt parser is forgiving but quietly ignores broken rules. The validator patterns.
πŸ—ΊοΈ Fix missing or wrong sitemap declaration PLANNED
Sitemap: directive in robots.txt should point at the absolute URL of your sitemap or sitemap index. Common bugs: relative URL, missing protocol, pointing at sitemap.html instead of .xml.
πŸ”— Fix 404 URLs in sitemap PLANNED
Sitemap should contain only live, indexable URLs. 404s, redirects, noindex pages all dilute crawl budget. The CI-pipeline check that catches dead URLs before they reach the sitemap.
πŸ“š Fix oversized sitemap files PLANNED
50,000 URLs or 50MB max per sitemap. Larger means a sitemap index referencing multiple sitemaps. Split by content type, by date, or by section. The patterns for ecommerce vs publishing vs SaaS sites.
πŸ“… Fix lastmod and priority abuse PLANNED
lastmod must reflect actual content change, not file regeneration. priority is mostly ignored by Google but inflated values look spammy. What's worth setting and what isn't.
🚷 Fix crawler-trap patterns PLANNED
Infinite faceted URLs, calendar pages going 100 years back, parameter combinations exploding crawl space. Use robots.txt patterns or parameter handling in Search Console to cap exploration.
πŸ€– Fix conflicting noindex vs disallow PLANNED
Disallow: stops the crawl but doesn't deindex. noindex deindexes but only if the crawler can read the meta. Common bug: disallow and noindex on same URL means Google can't see the noindex, page stays indexed with the disallow message.

By platform

Where these files live varies by stack:

πŸ“° Fix robots/sitemap in WordPress PLANNED
Yoast / Rank Math sitemap generation, robots.txt via plugin vs file, multisite considerations, and the dynamic vs static sitemap trade-off.
πŸ›’ Fix robots/sitemap in Shopify PLANNED
Auto-generated sitemap-index, the limited robots.txt customisation, the apps that add custom routes, and the platform-level constraints.
βš›οΈ Fix robots/sitemap in Next.js PLANNED
app/sitemap.ts and app/robots.ts Metadata API, dynamic generation, the SSG-build-time sitemap pattern for large sites.

What our Robots & Sitemap Tester checks

The tester fetches your robots.txt, validates syntax, identifies blocked content, fetches every declared sitemap, validates XML, checks each URL responds 200, flags lastmod inconsistencies, and tests crawler-trap risk. For the full reference, see the Robots & Sitemap Guide.

πŸ€– Test your crawl config first

Run the tester. One bad line in robots.txt can deindex entire site sections β€” confirm yours doesn't.

Run Robots & Sitemap Tester β†’
Related Guides: Robots & Sitemap Guide  Β·  Example Report  Β·  Site Crawler Guide  Β·  Site Audit Guide
πŸ’¬ Got a problem?