The two most common deindex tools work on different parts of the crawl-index pipeline. Disallow in robots.txt blocks crawling — Google never reads the page. noindex in meta tag (or X-Robots-Tag header) drops the page from the index — but Google has to crawl the page first to see it. Combine them on an already-indexed URL and you get the worst outcome: Google can't crawl to see the noindex, so the URL stays indexed indefinitely with a "Indexed, though blocked by robots.txt" warning. This guide covers the right sequence.
| Method | What it does | When to use |
|---|---|---|
| Disallow | Prevents crawling | Stop crawl budget waste on URLs not yet indexed |
| noindex meta | Removes from index after crawl | Deindex pages already in Google's index |
| X-Robots-Tag header | Same as noindex meta but for non-HTML | PDFs, images, files |
| 404 / 410 | Signals page gone | Permanent removal of deleted content |
| URL Removal Tool | Temporary index suppression (~6 months) | Urgent removal while waiting for noindex/410 |
# robots.txt User-agent: * Disallow: /private/ <!-- /private/page meta --> <meta name="robots" content="noindex">
What happens:
Use Disallow alone. Prevents Google from discovering and indexing in the first place.
User-agent: * Disallow: /never-public/
# If currently: User-agent: * Disallow: /to-deindex/ # Remove that line so Google can crawl
<!-- HTML meta --> <meta name="robots" content="noindex, follow"> <!-- "follow" tells Google to follow outbound links from the page --> <!-- Use "noindex, nofollow" only if you also don't want link signals to propagate -->
site:example.com inurl:/to-deindex/ in Google search shows decreasing results.
# Once URLs are confirmed deindexed (zero results in site: query): User-agent: * Disallow: /to-deindex/ # Now prevents future crawl budget wastePages stay deindexed (noindex meta still applies on actual crawls), and budget isn't wasted on re-checking them.
PDFs, images, videos can't have meta tags. Use HTTP header instead:
location ~* \.(pdf|doc|docx)$ {
if ($request_uri ~* "/internal/") {
add_header X-Robots-Tag "noindex, nofollow" always;
}
}
<FilesMatch "\.(pdf|doc|docx)$"> Header set X-Robots-Tag "noindex, nofollow" </FilesMatch>
// Express
app.use('/api/', (req, res, next) => {
res.setHeader('X-Robots-Tag', 'noindex');
next();
});
// Next.js — middleware.ts
export function middleware(request) {
if (request.nextUrl.pathname.startsWith('/staging/')) {
const response = NextResponse.next();
response.headers.set('X-Robots-Tag', 'noindex, nofollow');
return response;
}
}
For pages with sensitive content (accidentally exposed PII, leaked drafts), use Search Console's URL Removal Tool for fast suppression:
When a page is permanently deleted (not moved), serve 410 instead of 404. Signals "this is gone forever" more clearly.
// Express
app.get('/old-discontinued-page', (req, res) => {
res.status(410).send('Gone');
});
// nginx
location = /old-discontinued-page {
return 410;
}
Google deindexes 410s faster than 404s (~2-4 weeks vs 4-8 weeks).
# BEFORE (wrong): robots.txt: Disallow: /wp-admin/ HTML: <meta name="robots" content="noindex"> # Result: stays indexed # CORRECTION: # Step 1 — temporarily allow crawl robots.txt: # Disallow: /wp-admin/ (commented out) # Step 2 — noindex meta still in place # Step 3 — wait for deindexing # Step 4 — re-add Disallow once deindexed
# Best: HTTP basic auth on staging (no crawl, no index, no deindex problem)
# Plus: noindex header for belt-and-braces
location / {
auth_basic "Staging";
auth_basic_user_file /etc/nginx/.htpasswd;
add_header X-Robots-Tag "noindex, nofollow" always;
}
# If PDFs already indexed: # Step 1 — add X-Robots-Tag: noindex header # Step 2 — Search Console URL Inspection to expedite # Step 3 — wait for deindexing # Step 4 — move files behind auth or delete
site:example.com inurl:/to-deindex/ returns zero results. Page no longer surfaces in normal search either.