/ Robots & Sitemap Fixes / Fix Robots Sitemap WordPress

How to Fix Robots.txt and Sitemap in WordPress

WordPress generates robots.txt and sitemap.xml dynamically — but the defaults are often suboptimal: crawling waste on archive pages, missing exclusions for admin/cart paths, incomplete or stale sitemaps. This guide covers WordPress-specific robots and sitemap management via Yoast, Rank Math and AIOSEO. Pair with robots/sitemap guide and agent compatibility.

Step 1: Audit current robots.txt

Visit yoursite.com/robots.txt. Check: User-agent rules, Disallow paths, Sitemap URL line. WordPress virtual robots.txt is permissive by default; SEO plugins extend it.

Step 2: Edit robots.txt via SEO plugin

Yoast → Tools → File editor → robots.txt. Rank Math → General Settings → Edit robots.txt. AIOSEO → Tools → Robots.txt editor. Add Disallow rules for low-value paths.

Step 3: Configure useful disallows

Block: /wp-admin/ (already default), /wp-login.php, /?s= (internal search), /tag/ (if you noindex tags anyway), /cart/, /checkout/, /my-account/ (for WooCommerce). Allow: /wp-admin/admin-ajax.php (needed for legitimate AJAX).

Step 4: Configure AI bot policy

Decide: allow GPTBot, ClaudeBot, PerplexityBot, Google-Extended (typical permissive policy), or block training-only crawlers. Add User-agent + Allow/Disallow rules accordingly. Document choices with comments.

Step 5: Audit current sitemap

Visit /sitemap_index.xml (Yoast/Rank Math/AIOSEO standard location). Confirm sub-sitemaps exist: post-sitemap, page-sitemap, category-sitemap, author-sitemap (if used). Check URLs, lastmod dates, priorities.

Step 6: Configure sitemap exclusions

Plugin Settings → Sitemap. Exclude: media items (usually irrelevant), tag archives (if low-value), author archives (single-author sites), date archives. Include all genuine content URLs.

Step 7: Submit to Search Console

Search Console → Sitemaps → Add new sitemap → enter /sitemap_index.xml. Verify it processes successfully. Repeat for Bing Webmaster Tools.

Frequently Asked Questions

Why is my WordPress sitemap returning 404?
Most common cause: permalinks not flushed (Settings → Permalinks → Save). Second cause: SEO plugin conflict (running two plugins both trying to manage sitemap). Third: server-level rewrite rules missing (some shared hosts don't honour WordPress's .htaccess). Check by visiting /sitemap.xml and /sitemap_index.xml — at least one should return XML.
Should I block AI bots in WordPress robots.txt?
Policy decision. Allow if you want AI citation visibility (most service businesses). Block training-only bots (CCBot is one) if you have copyright concerns about content training. Allow citation bots (PerplexityBot, ClaudeBot) regardless if you want to appear in AI search results.
How often does WordPress update the sitemap?
Yoast/Rank Math/AIOSEO regenerate on content changes (publish, update, delete). The sitemap_index.xml is cached briefly (typically 1 hour) for performance. Manual regeneration: visit /sitemap_index.xml — plugin regenerates if stale.
Do I need both robots.txt and noindex meta tags?
They do different jobs. robots.txt: prevent crawling (Google won't fetch the page). Noindex meta: prevent indexing (Google fetches but doesn't index). Don't noindex AND disallow in robots — Google can't see the noindex if disallowed from crawling. Use one or the other per URL pattern.
Why does WordPress's default robots.txt allow /wp-admin/admin-ajax.php?
Many WordPress features (front-end forms, AJAX search, dynamic content) use admin-ajax.php legitimately. Blocking it breaks these features for crawlers (Googlebot may see broken pages). Default is correct — allow admin-ajax.php while blocking the rest of /wp-admin/.

🤖 Audit robots.txt and sitemap

Validate WordPress robots.txt rules and sitemap completeness.

Run Robots/Sitemap Audit →
Related Guides: Robots/Sitemap Guide  ·  All Robots/Sitemap Fixes  ·  Fix Agent Compat in WordPress  ·  Set up llms.txt in WordPress
💬 Got a problem?