An accidental Disallow in robots.txt is the fastest way to deindex content. A single line — Disallow: / — can de-rank an entire site. The Site Crawler flags URLs it couldn't access because robots.txt blocked them. This guide walks through finding accidental blocks, fixing them safely, and the staging-environment leak that causes 80% of production robots.txt disasters.
curl https://yourdomain.com/robots.txtRead every Disallow line. For each, ask: what URLs does this block, and was that intentional?
# Blocks everything from all crawlers — disaster on production User-agent: * Disallow: / # Blocks Google specifically — usually a development leak User-agent: Googlebot Disallow: / # Blocks everything except homepage — common staging artefact User-agent: * Disallow: / Allow: /$If any of these are in production, immediately fix.
Disallow: / to staging robots.txt to keep crawlers out. A deployment pipeline copies the staging robots.txt to production. Site goes invisible overnight.robots.staging.txt regardless of environment, Docker container using a staging entrypoint script in production.
// app/robots.ts
export default function robots() {
const isProd = process.env.VERCEL_ENV === 'production';
return isProd
? { rules: [{ userAgent: '*', allow: '/' }], sitemap: 'https://yourdomain.com/sitemap.xml' }
: { rules: [{ userAgent: '*', disallow: '/' }] };
}
Production allows; staging/preview blocks. No file-copy gymnastics required.
# Intended to block ?print=true Disallow: /*? # Actually blocks every URL containing a query string # Fix: Disallow: /*?print=*
# Old-school pattern that no longer makes sense Disallow: /wp-includes/ # This blocks Google from rendering your pages — they need CSS/JS to evaluate UX. # Fix: remove the block, or add explicit allows: Allow: /wp-includes/*.css Allow: /wp-includes/*.js
Disallow: /Admin/ doesn't block /admin/. If your CMS routes are case-insensitive, you may think you've blocked something that's actually open.
User-agent: * Allow: / # Block administrative paths Disallow: /admin/ Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php # Block search results and parameter combinations Disallow: /search? Disallow: /*?session= # Block crawl traps Disallow: /calendar/*/* Sitemap: https://yourdomain.com/sitemap.xml
Verify the blocked-URL count has dropped. Remaining blocks should all be intentional.