AIWebPageSEO / SEO Tools / Robots & Sitemap Tester / Robots & Sitemap Guide

Robots.txt and XML Sitemap: Setup and Validation Guide

Robots.txt tells Google which pages not to crawl. Your XML sitemap tells Google which pages to crawl. Getting either wrong can silently remove your pages from Google's index — here is how to check and fix both.

🤖 Test Robots & Sitemap All Audit Tools →

The two most common robots.txt mistakes

1. Accidentally blocking your entire site

The directive Disallow: / under User-agent: * blocks every crawler from crawling any page on your site. This is a catastrophic error that removes your entire site from Google. Always test robots.txt changes using the Robots & Sitemap Tester before deploying.

Never do this: Disallow: / under User-agent: * — this tells every crawler to crawl nothing on your site.

2. Blocking CSS and JavaScript

Google renders pages using a headless browser. If your robots.txt blocks the CSS and JavaScript files that build your page layout, Google sees a broken, unstyled page and may classify it as low quality. Allow all CSS, JS and font files.

What a correct XML sitemap looks like

A valid sitemap is an XML file listing your canonical, indexable URLs. Every URL in the sitemap should return HTTP 200, have a canonical tag pointing to itself, and not have a noindex robots directive. Include lastmod dates so Google knows when pages were last updated.

Quick check: Visit https://yoursite.com/sitemap.xml in a browser. If it displays XML content, it is accessible. If it shows a 404 or blank page, you need to create or configure your sitemap.

🤖 Test Robots & Sitemap Now

Run the Robots & Sitemap Tester and get actionable results in minutes. Pay as you go — no subscription needed.

Test Robots & Sitemap →

Related tools