/ Site Crawler Fixes / Robots Blocks

How to Fix Robots.txt Blocks

An accidental Disallow in robots.txt is the fastest way to deindex content. A single line — Disallow: / — can de-rank an entire site. The Site Crawler flags URLs it couldn't access because robots.txt blocked them. This guide walks through finding accidental blocks, fixing them safely, and the staging-environment leak that causes 80% of production robots.txt disasters.

1. Audit your current robots.txt

Step 1
Fetch the file
curl https://yourdomain.com/robots.txt
Read every Disallow line. For each, ask: what URLs does this block, and was that intentional?
Step 2
Check for the catastrophic patterns
Look for these red flags:
# Blocks everything from all crawlers — disaster on production
User-agent: *
Disallow: /

# Blocks Google specifically — usually a development leak
User-agent: Googlebot
Disallow: /

# Blocks everything except homepage — common staging artefact
User-agent: *
Disallow: /
Allow: /$
If any of these are in production, immediately fix.

2. Cross-reference with Site Crawler findings

Step 1
Run the crawler with robots-respect enabled
Run the Site Crawler. It reports any URLs it discovered but couldn't fetch because robots.txt blocked them. Compare these to your intent: which were meant to be blocked (admin, internal endpoints), and which were unintentionally blocked?

3. The staging-config leak

⚠️ Most common production disaster: a development team adds Disallow: / to staging robots.txt to keep crawlers out. A deployment pipeline copies the staging robots.txt to production. Site goes invisible overnight.
Step 1
Verify production robots.txt is the right one
Compare your production robots.txt against your repo. If they don't match, find the deployment step that's copying the wrong file. Common culprits: CI pipeline copying robots.staging.txt regardless of environment, Docker container using a staging entrypoint script in production.
Step 2
Environment-aware robots.txt
Best practice: serve robots.txt dynamically based on environment. Example for Next.js:
// app/robots.ts
export default function robots() {
  const isProd = process.env.VERCEL_ENV === 'production';
  return isProd
    ? { rules: [{ userAgent: '*', allow: '/' }], sitemap: 'https://yourdomain.com/sitemap.xml' }
    : { rules: [{ userAgent: '*', disallow: '/' }] };
}
Production allows; staging/preview blocks. No file-copy gymnastics required.

4. Fix accidental scope errors

Common error 1: trailing wildcards too broad
# Intended to block ?print=true
Disallow: /*?

# Actually blocks every URL containing a query string
# Fix:
Disallow: /*?print=*
Common error 2: blocking CSS/JS
# Old-school pattern that no longer makes sense
Disallow: /wp-includes/

# This blocks Google from rendering your pages — they need CSS/JS to evaluate UX.
# Fix: remove the block, or add explicit allows:
Allow: /wp-includes/*.css
Allow: /wp-includes/*.js
Common error 3: case sensitivity
Robots.txt is case-sensitive. Disallow: /Admin/ doesn't block /admin/. If your CMS routes are case-insensitive, you may think you've blocked something that's actually open.

5. Scope blocks correctly for what really should be blocked

Step 1
A clean production robots.txt template
User-agent: *
Allow: /

# Block administrative paths
Disallow: /admin/
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

# Block search results and parameter combinations
Disallow: /search?
Disallow: /*?session=

# Block crawl traps
Disallow: /calendar/*/*

Sitemap: https://yourdomain.com/sitemap.xml

6. Test in Search Console

Step 1
Use the robots.txt Tester
Search Console → Settings → robots.txt (the legacy tester is being phased out). Or use the URL Inspection tool: paste an affected URL, look at the "Crawl allowed" line. Should say "Yes". If "No: blocked by robots.txt", review the rule that's blocking it.
Step 2
Request re-crawl
Once your robots.txt is fixed, use Search Console's URL Inspection tool on key pages. Click Request Indexing. Google re-crawls within 24-72 hours typically.

7. Re-run the Site Crawler

Verify the blocked-URL count has dropped. Remaining blocks should all be intentional.

🕷 Re-run the Site Crawler

Verify accidental robots blocks are gone.

Run Site Crawler →
Related Guides: Site Crawler Fixes  ·  Fix Sitemap Mismatches  ·  Robots & Sitemap Fixes  ·  Robots & Sitemap Guide
💬 Got a problem?