⭐ Beginner — No coding experience needed
Robots.txt and Sitemap.xml: Your Site's Map for Google
Learn what robots.txt and sitemap.xml do, how they work together, and how to set both up correctly so Google indexes every page you want and skips the ones you don't.
What you will learn in this guide
- What robots.txt does and where it lives
- The difference between Disallow and noindex
- What a sitemap.xml is and how to generate one
- How to submit your sitemap to Google
- Common mistakes that block your whole site by accident
1 What is robots.txt?
robots.txt is a plain text file at the root of your domain (yoursite.com/robots.txt) that tells crawlers which URLs they may and may not fetch. It is the first file most search engines request when they visit your site.
User-agent: *
Disallow: /admin/
Disallow: /cart
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
Important:robots.txt prevents crawling, not indexing. If another site links to a blocked URL, Google can still index it without seeing the content. Use noindex meta tags or HTTP headers for true exclusion.
2 What is sitemap.xml?
A sitemap.xml is a list of every URL on your site you want indexed, in a format Google understands. It helps Google find pages that internal links might miss.
| Element | Required? | Purpose |
|---|---|---|
| Yes | The full URL | |
| No | When the page was last meaningfully changed | |
| No | How often the page updates (Google ignores this now) | |
| No | Relative importance 0.0-1.0 (Google ignores this) |
Most modern sites generate sitemaps automatically. WordPress with Yoast or Rank Math creates them at /sitemap_index.xml. Custom sites can use packages like sitemap-generator or build them in the CMS.
3 How to set both up
- 1Generate a sitemapMost CMSs do this automatically. If not, use a sitemap generator and upload sitemap.xml to your site root.
- 2Reference it in robots.txtAdd a
Sitemap:line at the bottom of robots.txt with the full URL. This is how new search engines discover it. - 3Submit to Google Search ConsoleIn Search Console → Sitemaps, paste the sitemap URL. Google will fetch it within 24 hours and start indexing.
- 4Audit weeklyUse the audit tool to confirm every URL in the sitemap returns 200, is indexable, and has no noindex tag. Mismatches confuse Google.
4 The 5 most dangerous robots.txt mistakes
| Mistake | What happens | Fix |
|---|---|---|
Disallow: / | Blocks entire site from crawling | Remove the slash; use specific paths |
| Blocking CSS or JS | Google can't render the page properly; rankings drop | Allow /wp-includes/, /assets/, etc. |
| Disallow on noindex pages | Google can't see the noindex; URL stays indexed | Allow crawling, use noindex meta tag instead |
| No sitemap reference | Crawlers may miss new pages for weeks | Add Sitemap: line |
| Old test directives left in | Production blocking dev paths or vice versa | Audit on deploy; never copy staging robots.txt to prod |
Test before saving:Even a single typo in robots.txt can de-index your entire site overnight. Google Search Console has a robots.txt tester — use it.