/ Agent Compat Fixes / Agent Blocks

How to Fix AI Agent User-Agent Blocks

WAFs and bot-protection often block AI agents alongside scrapers because both look programmatic. Default Cloudflare bot-fight, AWS WAF managed rules, and Imperva all routinely block GPTBot, ClaudeBot, PerplexityBot. Result: zero AI visibility, AI answer engines never citing you. This guide covers identifying legitimate agent traffic, verifying it isn't spoofed, and allowlisting at each major WAF.

1. Known AI agent user agents (2026)

AgentUA string containsPurpose
GPTBotGPTBotOpenAI training data
ChatGPT-UserChatGPT-UserChatGPT user browsing
OAI-SearchBotOAI-SearchBotChatGPT search results
ClaudeBotClaudeBotAnthropic crawler
Claude-WebClaude-WebClaude live web access
Anthropic-AIanthropic-aiAnthropic training
PerplexityBotPerplexityBotPerplexity index
Perplexity-UserPerplexity-UserPerplexity user queries
Google-Extended(Googlebot UA + token)Gemini training opt-out
CCBotCCBot/2.0Common Crawl (many AI providers)
Meta-ExternalAgentmeta-externalagentMeta AI
BytespiderBytespiderByteDance AI

2. Audit current traffic

Step 1
Grep access logs
grep -E "GPTBot|ClaudeBot|PerplexityBot|Anthropic|CCBot|Google-Extended" \
  /var/log/nginx/access.log | tail -50

# Check status codes — 403/429 means blocked
grep "GPTBot" /var/log/nginx/access.log | awk '{print $9}' | sort | uniq -c
Step 2
Cloudflare bot analytics
Cloudflare → Security → Bots. Lists detected bots with action taken. Agents shown as "blocked" or "challenged" need allowlisting.

3. Cloudflare allowlist

Method 1: Verified Bots (easiest)

Cloudflare auto-detects and validates known bots via IP and reverse DNS:

Security → Bots → Bot Fight Mode
- "Allow verified bots" → enabled
- No manual UA rules needed for GPTBot, ClaudeBot, PerplexityBot, Googlebot, Bingbot

Method 2: Custom WAF rule (specific control)

# Cloudflare → Security → WAF → Custom rules → Create
# Field: User Agent
# Operator: contains
# Value: GPTBot
# Then: Skip → all remaining custom rules + managed challenge

# Or in expression syntax:
(http.user_agent contains "GPTBot") or
(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "PerplexityBot") or
(http.user_agent contains "Anthropic") or
(http.user_agent contains "OAI-SearchBot")
# Action: Skip → managed challenge

4. AWS WAF allowlist

# Web ACL → Rules → Add rule → Custom rule
# Statement: Inspect → Single header → User-Agent
# Match type: Contains string
# String: GPTBot
# Action: Allow

# Add additional rules for ClaudeBot, PerplexityBot, etc.
# Set higher priority than blocking rules — allow first, block what's left

5. Nginx allowlist (no WAF)

# /etc/nginx/conf.d/ai-agents.conf
map $http_user_agent $is_ai_agent {
  default 0;
  "~*GPTBot" 1;
  "~*ChatGPT-User" 1;
  "~*ClaudeBot" 1;
  "~*Claude-Web" 1;
  "~*anthropic-ai" 1;
  "~*PerplexityBot" 1;
  "~*Perplexity-User" 1;
  "~*OAI-SearchBot" 1;
  "~*Google-Extended" 1;
}

server {
  # Skip rate-limit and challenge for AI agents
  if ($is_ai_agent = 1) {
    set $bypass_check 1;
  }
}

6. Verify legitimacy via IP

# OpenAI publishes ranges at openai.com/gptbot-ranges.json
curl -s https://openai.com/gptbot-ranges.json | jq

# Anthropic publishes ClaudeBot ranges
curl -s https://anthropic.com/claudebot-ranges.json | jq

# Auto-update WAF allowlists from these every 24h via cron

7. Test the allowlist

Step 1
curl with each agent UA
for ua in "GPTBot/1.0" "ClaudeBot/1.0" "PerplexityBot/1.0" "anthropic-ai/1.0"; do
  status=$(curl -s -o /dev/null -w "%{http_code}" -A "$ua" https://example.com/)
  echo "$status $ua"
done

# All should return 200, not 403/429/503
💡 Don't trust user-agent strings alone for allowlisting in production — they're trivially spoofed and let scrapers in. Use Cloudflare Verified Bots or pull IP ranges from official endpoints. UA-only allowlisting is fine for diagnosis but not for security policy.

🤖 Re-run Agent Compat audit

Verify all major AI agents reach your content.

Run Agent Compat →
Related Guides: Agent Compat Fixes  ·  Fix JS-Only Content  ·  Fix Robots Blocks
💬 Got a problem?