Without consistent measurement, you can't tell if AEO efforts work. Sporadic spot-checks (asking ChatGPT once a month) produce noise, not signal. The fix is a measurement system: same prompts, weekly cadence, per-engine tracking, change-attribution discipline. This guide covers the methodology that turns AI citations from anecdote into measurable trend.
30-50 prompts, locked after initial selection, run every week:
Selection criteria:
- Covers your major customer intents (educational, comparison, recommendation, troubleshooting)
- Phrased as real users would phrase them (see how-to-fix-ai-query-match)
- Mix of broad ("what is X") and specific ("X for [context]")
- Includes brand prompts ("is [brand] good for X")
- Includes competitor prompts ("X vs [competitor]")
Locked once selected:
- Don't swap prompts during the measurement period
- Add new prompts as separate cohort if you want to track new intents
- The original 30-50 are your measurement instrument
Document each prompt:
- exact text
- target page on your site
- expected citation pattern (you cited / mentioned / neither)
- business priority (high / medium / low)
| Engine | Why track separately |
|---|---|
| ChatGPT (GPT-4) | Largest user base, training-data citations + live browsing |
| Claude | Growing fast in B2B/SaaS, different training corpus |
| Perplexity | Citation-friendly, fastest to surface new sources |
| Gemini | Heavy Google Search dependence, different from others |
| Microsoft Copilot | Bing-powered, enterprise heavy |
| Meta AI | Different data source (Meta platforms) |
Citation patterns differ markedly per engine. Perplexity might cite you heavily while ChatGPT ignores you, or vice versa. Aggregate numbers hide these gaps. Track per-engine to find which engine's gap is the easiest opportunity.
For each prompt × engine combination, record: CITED — your domain explicitly cited in response MENTIONED — brand name appears without citation RECOMMENDED — your brand recommended over alternatives COMPETITOR-CITED — direct competitor cited, you absent ALTERNATIVE-CITED — adjacent option cited, you absent NO-CITATION — answer with no specific source attribution INVISIBLE — answer doesn't mention category players
These categories drive different responses. "Competitor-cited" is your most pressing problem (you're losing share to a specific player). "Invisible" is foundational work (category-level AEO building).
Variation comes from model behaviour, your own changes, and external events (competitor news, algorithm tweaks). Reduce noise:
Citation lifts that follow changes 2-8 weeks later are usually causal. Mark every change:
Week 1: Baseline = 5% citation share Week 4: Added Person schema across 50 articles [MARKED] Week 8: Brand mention campaign on Reddit started [MARKED] Week 11: Citation share 9% — lift 4pp from schema? Week 14: Citation share 13% — lift from mentions? Week 17: New definitive guide published [MARKED] Week 22: Citation share 18% — guide effect? Pattern: post-change lift starting 2-8 weeks later Confidence: stronger when multiple change-cycles show consistent lag
Single observations are weak evidence. After 3-4 cycles of marked-change → lift, attribution becomes credible enough to plan investment around.
Your citation share moves in a context. Competitor citation share is the reference:
Per prompt × engine, track who's cited: - You - Direct competitor A - Direct competitor B - Adjacent player C - Wikipedia / generic source Build a stacked-bar view over time: - Total category citations per prompt - Your share % - Competitors' shares - Drift over months — are you gaining or losing share? This is more informative than your absolute citation count. Citation count can rise while share falls (category growing faster than you). Share is the real metric.
Weekly: data collection (automated)
- per prompt × engine raw outcomes
- delta from prior week
Monthly: trend review (30 minutes)
- prompts moved from invisible → cited
- prompts moved from cited → invisible (investigate!)
- per-engine consistency check
- flag anomalies (sudden drops, sudden gains)
Quarterly: investment review (2 hours)
- which changes correlated with lifts
- which efforts produced no measurable lift
- what to do more of, what to drop
- whether to expand prompt set with new intents
- competitive share analysis
Annually: methodology review
- prompt set still representative?
- engines covered still relevant?
- measurement instrument still calibrated?
Configure custom prompt sets, weekly automation, change marking.
Run AI Visibility Tracker →