/ Robots & Sitemap Fixes / noindex vs Disallow

How to Fix Conflicting noindex vs Disallow

The two most common deindex tools work on different parts of the crawl-index pipeline. Disallow in robots.txt blocks crawling — Google never reads the page. noindex in meta tag (or X-Robots-Tag header) drops the page from the index — but Google has to crawl the page first to see it. Combine them on an already-indexed URL and you get the worst outcome: Google can't crawl to see the noindex, so the URL stays indexed indefinitely with a "Indexed, though blocked by robots.txt" warning. This guide covers the right sequence.

1. Understand the difference

MethodWhat it doesWhen to use
DisallowPrevents crawlingStop crawl budget waste on URLs not yet indexed
noindex metaRemoves from index after crawlDeindex pages already in Google's index
X-Robots-Tag headerSame as noindex meta but for non-HTMLPDFs, images, files
404 / 410Signals page gonePermanent removal of deleted content
URL Removal ToolTemporary index suppression (~6 months)Urgent removal while waiting for noindex/410

2. The conflict scenario

The bad pattern

# robots.txt
User-agent: *
Disallow: /private/

<!-- /private/page meta -->
<meta name="robots" content="noindex">

What happens:

  1. /private/page was indexed before you added these
  2. You add Disallow + noindex meta at the same time
  3. Google can't crawl (Disallow), so can't see the noindex meta
  4. Page stays indexed indefinitely
  5. Search Console shows "Indexed, though blocked by robots.txt"

3. The correct sequence

For not-yet-indexed URLs

Use Disallow alone. Prevents Google from discovering and indexing in the first place.

User-agent: *
Disallow: /never-public/

For already-indexed URLs you want to deindex

Step 1
Remove Disallow rule
# If currently:
User-agent: *
Disallow: /to-deindex/

# Remove that line so Google can crawl
Step 2
Add noindex to the pages
<!-- HTML meta -->
<meta name="robots" content="noindex, follow">

<!-- "follow" tells Google to follow outbound links from the page -->
<!-- Use "noindex, nofollow" only if you also don't want link signals to propagate -->
Step 3
Force re-crawl in Search Console
Search Console → URL Inspection → enter URL → Request Indexing. Speeds up Google's re-crawl. Repeat for each URL or wait for natural crawl over 2-8 weeks.
Step 4
Monitor deindexing
Search Console → Pages → "Excluded by noindex tag" count rises. site:example.com inurl:/to-deindex/ in Google search shows decreasing results.
Step 5
After deindexing, optionally re-add Disallow
# Once URLs are confirmed deindexed (zero results in site: query):
User-agent: *
Disallow: /to-deindex/

# Now prevents future crawl budget waste
Pages stay deindexed (noindex meta still applies on actual crawls), and budget isn't wasted on re-checking them.

4. X-Robots-Tag for non-HTML

PDFs, images, videos can't have meta tags. Use HTTP header instead:

nginx

location ~* \.(pdf|doc|docx)$ {
  if ($request_uri ~* "/internal/") {
    add_header X-Robots-Tag "noindex, nofollow" always;
  }
}

Apache

<FilesMatch "\.(pdf|doc|docx)$">
  Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>

Application middleware

// Express
app.use('/api/', (req, res, next) => {
  res.setHeader('X-Robots-Tag', 'noindex');
  next();
});

// Next.js — middleware.ts
export function middleware(request) {
  if (request.nextUrl.pathname.startsWith('/staging/')) {
    const response = NextResponse.next();
    response.headers.set('X-Robots-Tag', 'noindex, nofollow');
    return response;
  }
}

5. URL Removal Tool (urgent cases)

For pages with sensitive content (accidentally exposed PII, leaked drafts), use Search Console's URL Removal Tool for fast suppression:

  1. Search Console → Removals → New Request
  2. Enter URL → Submit
  3. Google suppresses from search results within 24 hours
  4. Suppression lasts ~6 months
  5. For permanent removal, combine with noindex meta or 410 status
⚠️ URL Removal is temporary. After 6 months, the URL returns to search results unless you've made the underlying change (noindex, 410, password protection, content removal).

6. 410 Gone for permanent removal

When a page is permanently deleted (not moved), serve 410 instead of 404. Signals "this is gone forever" more clearly.

// Express
app.get('/old-discontinued-page', (req, res) => {
  res.status(410).send('Gone');
});

// nginx
location = /old-discontinued-page {
  return 410;
}

Google deindexes 410s faster than 404s (~2-4 weeks vs 4-8 weeks).

7. Common scenarios

Scenario 1: Admin pages already indexed

# BEFORE (wrong):
robots.txt:
  Disallow: /wp-admin/
HTML:
  <meta name="robots" content="noindex">
# Result: stays indexed

# CORRECTION:
# Step 1 — temporarily allow crawl
robots.txt:
  # Disallow: /wp-admin/  (commented out)
# Step 2 — noindex meta still in place
# Step 3 — wait for deindexing
# Step 4 — re-add Disallow once deindexed

Scenario 2: Staging environment indexed

# Best: HTTP basic auth on staging (no crawl, no index, no deindex problem)
# Plus: noindex header for belt-and-braces
location / {
  auth_basic "Staging";
  auth_basic_user_file /etc/nginx/.htpasswd;
  add_header X-Robots-Tag "noindex, nofollow" always;
}

Scenario 3: PDFs with private content

# If PDFs already indexed:
# Step 1 — add X-Robots-Tag: noindex header
# Step 2 — Search Console URL Inspection to expedite
# Step 3 — wait for deindexing
# Step 4 — move files behind auth or delete

8. Verify resolution

Step 1
Re-run Robots Tester
Conflict findings clear. No URLs flagged with both Disallow and noindex active.
Step 2
Search Console index coverage
"Indexed, though blocked by robots.txt" count should drop to zero. Deindexed pages move to "Excluded by noindex tag" or "Crawled - currently not indexed".
Step 3
site: query confirmation
site:example.com inurl:/to-deindex/ returns zero results. Page no longer surfaces in normal search either.
💡 The single rule: never combine Disallow and noindex on the same URL when you want it deindexed. Use one or the other, in the correct sequence. noindex first to drop indexed pages, then Disallow after deindexing completes to save crawl budget. Doing both simultaneously locks the URL into "indexed but uncrawlable" purgatory.

🤖 Re-run the Robots & Sitemap Tester

Verify no conflicting rules remain.

Run Tester →
Related Guides: Robots & Sitemap Fixes  ·  Fix Robots Blocks  ·  Fix Crawler Traps  ·  Robots & Sitemap Guide
💬 Got a problem?