/ Robots & Sitemap Fixes / robots.txt Syntax

How to Fix robots.txt Syntax Errors

robots.txt parsing is forgiving — too forgiving. Malformed directives don't throw errors; they're silently ignored. A typo in Disalow: means that rule is skipped and the URL stays crawlable when you thought it was blocked. Wildcards in the wrong position match too much or too little. This guide covers the directive grammar, the common typos, the position-sensitive wildcards, and the validators that catch what visual inspection misses.

1. The directive grammar

Every directive is one line of Field: value:

User-agent: *
Disallow: /admin/
Allow: /admin/help/
Sitemap: https://example.com/sitemap.xml

# Comment line
User-agent: Googlebot
Disallow: /private/

Field rules

2. Groups and User-agent

A User-agent: line opens a group. All following Allow/Disallow apply to that user-agent until the next User-agent: line.

# Group 1: all crawlers
User-agent: *
Disallow: /admin/
Disallow: /private/

# Group 2: Googlebot specifically
User-agent: Googlebot
Disallow: /tmp/

# Group 3: Bingbot specifically
User-agent: Bingbot
Disallow: /experimental/

# Sitemap is global, applies to all groups
Sitemap: https://example.com/sitemap.xml

Important: Googlebot ignores * group when it has its own group

# Bad assumption
User-agent: *
Disallow: /admin/
Disallow: /private/

User-agent: Googlebot
Disallow: /tmp/

# Googlebot only obeys its own group: Disallow: /tmp/
# /admin/ and /private/ are NOT blocked for Googlebot
# To block these for Googlebot too, repeat them in its group:

User-agent: *
Disallow: /admin/
Disallow: /private/

User-agent: Googlebot
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/

3. Wildcards: * and $

Two wildcard characters with different meanings:

WildcardMeaningExample
*Matches any sequence of characters/*.pdf matches /file.pdf, /docs/file.pdf
$Anchors to end of URL/*.pdf$ matches /file.pdf but NOT /file.pdf?id=1

Common wildcard patterns

# Block all PDFs
Disallow: /*.pdf$

# Block all URLs with query strings
Disallow: /*?

# Block specific query parameter
Disallow: /*?sort=

# Block paths containing /admin/ anywhere
Disallow: /*admin/

# Block CSV downloads in a specific directory
Disallow: /reports/*.csv$

Implicit wildcard at end

# These two are equivalent
Disallow: /admin/
Disallow: /admin/*

# Without trailing slash, prefix match
Disallow: /admin
# Matches /admin, /admin/, /admin/page, /administrator

4. Common syntax errors

Typo: Disalow vs Disallow

# BAD: silently ignored
Disalow: /admin/

# RIGHT
Disallow: /admin/

Missing colon

# BAD: silently ignored
User-agent *
Disallow /admin/

# RIGHT
User-agent: *
Disallow: /admin/

Comments without #

# BAD: tries to disallow "/admin/ temporary block"
Disallow: /admin/ temporary block

# RIGHT
Disallow: /admin/   # temporary block
# or
# temporary block
Disallow: /admin/

Mixed BOM and encoding

# File saved with UTF-8 BOM (byte-order mark) confuses some parsers
# Save as UTF-8 without BOM, Unix line endings (LF, not CRLF)

# Check with hexdump:
hexdump -C robots.txt | head -1
# Should NOT start with ef bb bf

Wildcards in User-agent

# BAD: wildcards do NOT work in User-agent values
User-agent: Googlebot*

# RIGHT: list each user-agent explicitly
User-agent: Googlebot
User-agent: Googlebot-Image
User-agent: Googlebot-News
Disallow: /

5. Non-standard directives

Google supports: User-agent, Allow, Disallow, Sitemap. Common non-standard directives:

# Crawl-delay: ignored by Google, used by Bing/Yandex
User-agent: Bingbot
Crawl-delay: 10

# Host: Yandex-specific
Host: example.com

# Clean-param: Yandex-specific
Clean-param: ref /forum/showthread.php

# Comments — always supported
# This is a comment

For crawl rate control with Google, use Search Console → Settings → Crawl rate (legacy feature, varies by account).

6. Validators

Google's official tester

Search Console → Settings → robots.txt Tester
# Paste your robots.txt
# Test specific URLs to confirm allow/disallow status
# Identifies syntax warnings with line numbers

Python: google-robotxt

pip install google-robotxt

# Parser identical to Googlebot's
python3 -c "
from google_robotxt import RobotsTxt
r = RobotsTxt.from_file('robots.txt')
print(r.is_allowed('Googlebot', '/some/path/'))
"

Node: robots-parser

npm install robots-parser

const robotsParser = require('robots-parser');
const fs = require('fs');
const content = fs.readFileSync('robots.txt', 'utf-8');
const robots = robotsParser('https://example.com/robots.txt', content);

console.log(robots.isAllowed('https://example.com/admin/', 'Googlebot'));
// false (blocked)

curl with manual inspection

# Fetch and look for byte-level issues
curl -v https://example.com/robots.txt 2>&1 | grep "Content-Type"
# Should be: text/plain; charset=utf-8

# Check first bytes — no BOM
curl -s https://example.com/robots.txt | head -1 | hexdump -C | head -1

7. CI/CD validation

# GitHub Actions example
- name: Validate robots.txt
  run: |
    pip install google-robotxt
    python3 -c "
    from google_robotxt import RobotsTxt
    r = RobotsTxt.from_file('public/robots.txt')
    
    # Critical paths must be allowed
    critical = ['/', '/products/', '/about/']
    for path in critical:
      assert r.is_allowed('Googlebot', path), f'CRITICAL: {path} blocked'
    
    # Sensitive paths must be blocked
    blocked = ['/admin/', '/api/internal/']
    for path in blocked:
      assert not r.is_allowed('Googlebot', path), f'SENSITIVE: {path} allowed'
    
    print('robots.txt validation passed')
    "

8. Verify after fixes

Step 1
Search Console tester
Open robots.txt Tester. Should show no syntax warnings. Test 5-10 representative URLs — each shows correctly Allowed or Blocked.
Step 2
Re-run the Robots Tester audit
Zero syntax findings. Intended blocks confirmed in test URLs.
💡 The single most dangerous robots.txt typo is "Disalow:" missing an L — it silently does nothing. Always paste your file into Google's official tester after any edit. Don't trust visual inspection.

🤖 Re-run the Robots Tester

Verify syntax is clean and rules apply correctly.

Run Robots Tester →
Related Guides: Robots & Sitemap Fixes  ·  Fix Robots Blocks  ·  Fix noindex vs Disallow  ·  Robots & Sitemap Guide
💬 Got a problem?