GEO playbook
How to get cited by ChatGPT, Claude, and Perplexity
Engine-by-engine GEO playbook. How each AI search engine picks citations, the schema and content structure that wins, the Reddit and Wikipedia amplification play, and the tools to track your citations.
How do you get cited by ChatGPT, Claude, and Perplexity? Not the way you ranked on Google. AI engines pick sources by different signals than Google: structural clarity (Claude), real-time freshness and community validation (Perplexity), and consensus across the open web (ChatGPT). The work is a different discipline called Generative Engine Optimization, and the sites that win it look surprisingly different from the sites that win classic SEO.
Key facts
- Engine fragmentation
- Only 11% of sites cited by ChatGPT are also cited by Perplexity. The two engines pick from largely different source pools, which means GEO is not one strategy but several.
- Reddit dominance
- Reddit is the #1 cited source across every major AI engine. Perplexity cites Reddit in 46.7% of its top citations; ChatGPT cites it about 12%.
- Structure premium
- Content with clear bullet points and definitional sentences is 30% more likely to be cited by Claude than equivalent paragraph-only content. Claude in particular favors structural clarity.
- Speed correlation
- ChatGPT-cited pages with First Contentful Paint under 0.4 seconds average 6.7 citations versus 2.1 for slower pages. Page speed matters more for AI engines than commonly assumed.
- AI Overviews mix
- 48% of Google AI Overview citations come from URLs outside Google's top 100 organic results. Classic SEO ranking is necessary but not sufficient for AI Overview inclusion.
- llms.txt reality
- Only 10.13% of domains have published an llms.txt file as of 2026. Google explicitly doesn't use it, but Microsoft and OpenAI crawlers actively fetch llms-full.txt when present.
Sources: 5W AI Platform Citation Source Index 2026, Discovered Labs engine-by-engine citation analysis (January 2026), Profound crawler activity data, OtterlyAI llms.txt experiment, Semrush 80-million-query LLM citation analysis, BrightEdge Generative Parser dataset. Get a free 48-hour audit. Last updated .
What GEO is, and why it's a different game than SEO
Generative Engine Optimization (GEO) is the practice of getting your content cited by AI search engines: ChatGPT, Claude, Perplexity, Google AI Mode and AI Overviews, and Microsoft Copilot. It overlaps with SEO in fundamentals (quality, structure, authority) but diverges in execution: GEO weights structural clarity, source consensus across the open web, and entity recognition more heavily than backlinks or domain authority. About 80 percent of LLM-cited pages do not rank in Google's top 10 for the same query.
The big shift between SEO and GEO is the role of structure and consensus. Google ranks pages partly by how many other respected pages link to them; AI engines weight how clearly a page answers a specific question and how often the same answer appears across multiple respected sources. The result: pages that are useful, scannable, and referenced by other people's content can get cited by ChatGPT or Perplexity without ever ranking on page 1 of Google for the same query.
The most useful frame for an SEO practitioner in 2026 is: GEO is a second discipline built on top of SEO, not a replacement for it. The pages best-positioned for GEO citations are usually also well-structured for Google, but the reverse isn't always true. A high-domain-authority page that's a wall of paragraphs can rank #1 on Google and never get cited by Claude.
Here are the terms you'll see throughout this guide, in plain English:
- GEO (Generative Engine Optimization)
- Optimizing content to be discovered, understood, and cited by AI search engines (ChatGPT, Claude, Perplexity, Google AI Mode, Microsoft Copilot). Distinct from SEO because the signals are different: structural clarity, source consensus, and entity recognition outweigh classic ranking signals like backlinks.
- AEO (Answer Engine Optimization)
- A near-synonym for GEO, sometimes used when the focus is on direct-answer surfaces (Google AI Overviews, voice assistants, featured snippets) rather than LLM chat citations. The tactics overlap heavily.
- Citation
- When an AI engine names your URL in its answer, typically as a clickable source link. The GEO equivalent of a #1 ranking, except citations can come from pages that don't rank in classic Google search at all.
- Crawler / user agent
- The bot AI engines use to fetch web content. GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google's AI training crawler). Blocking these in robots.txt removes you from citation eligibility.
- llms.txt
- A proposed markdown file at /llms.txt that lists a site's key resources for LLM consumption. Adoption is roughly 10% of domains in 2026. Microsoft and OpenAI crawlers fetch it; Google doesn't.
- Brand mention
- An unlinked reference to your brand or product elsewhere on the web. Counts as a consensus signal for LLMs even without a backlink, which makes brand-mention strategies a distinct part of GEO from traditional link-building.
- Entity
- A canonical thing (a product, a company, a concept) recognized by LLMs across sources. Strong entity recognition (clear Wikipedia entry, consistent description across the web, structured data) makes a brand much more likely to be cited.
For the broader SEO picture (where AI helps with classic SEO, where it hurts, what Google's Helpful Content classifier does), see our AI SEO for small business pillar guide. This guide focuses specifically on the LLM-citation half of the discipline.
Engine-by-engine deep dives: how each AI search engine actually picks citations
ChatGPT, Claude, Perplexity, and Google AI Overviews look similar on the surface and behave very differently underneath. Optimizing for one of them is not the same as optimizing for the others. Only 11 percent of sites cited by ChatGPT are also cited by Perplexity. The engine-by-engine differences below are the basis of any real GEO strategy.
ChatGPT (OpenAI)
Citation behavior: consensus-driven. Cites when web browsing is active; doesn't cite on training-data answers. Top sources: Wikipedia (~7.8%), Reddit (~12%), and competitor sites that appear consistently across the web. Speed correlation: pages with First Contentful Paint under 0.4 seconds average 6.7 citations versus 2.1 for slower pages. Crawler: GPTBot. Tactical priority: be a consistent presence across multiple respected sources on your topic, not a single perfect page.
Claude (Anthropic)
Citation behavior: doesn't cite by default; will when given source material or when its web search tool fires. Favors structural clarity. Bullet-pointed structured content is 30% more likely to be cited than equivalent paragraph form. Top formats: technical documentation, PDFs, whitepapers, clear definitional content. Crawler: ClaudeBot. Tactical priority: write in clearly structured sections with named concepts, FAQ blocks, and bulleted answer lists.
Perplexity
Citation behavior: real-time retrieval over a proprietary 200+ billion URL index; cites by default. Reddit dominates Perplexity citations at 46.7% of top citations, followed by official documentation. Content can appear in citations within hours or days. Crawler: PerplexityBot. Tactical priority: answer-first paragraphs, structured headers, frequent re-publishing, and active engagement in Reddit threads in your space.
Google AI Overviews / AI Mode
Citation behavior: hybrid. 54% of citations overlap with Google's top 20 organic results; 48% come from outside the top 100 organic. Multi-modal content (text plus images plus video plus structured data) shows the strongest correlation with inclusion (r=0.92 in 2026 research). Trigger rate ~13% of US desktop searches, much higher for informational queries. Crawler: Google-Extended (AI training) plus standard Googlebot. Tactical priority: rank pages on Google first, then layer custom images, video, and schema on top.
| Engine | Cites by default | Top source bias | Tactical priority |
|---|---|---|---|
| ChatGPT | Only with web browsing | Wikipedia, Reddit, consensus sites | Be present across multiple respected sources |
| Claude | No, unless asked or web tool fires | Structured docs, PDFs, bullets | Write in clear definitional structure |
| Perplexity | Yes, always | Reddit (~47%), official docs | Answer-first paragraphs, Reddit presence |
| Google AI Overviews | Yes, in the overview | Hybrid: top-20 organic + outside top-100 | Multi-modal content layered on ranking pages |
of sites cited by ChatGPT are also cited by Perplexity. Engine fragmentation is real.
average ChatGPT citations for pages with First Contentful Paint under 0.4 seconds, vs 2.1 for slower.
The five traits LLM-cited content shares across every engine
While each AI engine picks sources differently, the content that gets cited by multiple engines shares five common traits. If you only have time for the cross-engine work, do these five things first; the engine-specific optimization layers on top.
Clear answer in the first 50 words of each section
LLMs lift the opening sentence or two of each section as their cited extract. Putting the direct answer in the first 50 words massively raises citation odds across all four engines.
Complete schema markup
Article, FAQPage, HowTo, Organization, BreadcrumbList, DefinedTermSet, Speakable. Schema is rules-based extraction food for LLMs. More schema = more structured signals = more citations.
Named author with verifiable credentials
Real author byline tied to a Person schema entity with sameAs links to LinkedIn, X, or a public profile. LLMs cite content tied to real people more readily than anonymous content.
Recent publish or modified date
Visible on the page and in Article schema. Perplexity and ChatGPT live search both weight freshness. Stale dates suppress citation odds even when content is otherwise current.
Two to five outbound links to authoritative third-party sources
Linking out to primary sources (government, academic, established publishers) signals that the page is part of a verifiable source network, which LLM citation algorithms reward.
The cross-engine traits are not subtle, but they're also not what most SEO advice has emphasized historically. Notice what's NOT on the list: domain authority, backlink count, keyword density, content length. None of those correlate strongly with AI citation. What does correlate: the page being structurally machine-readable, written by an identifiable human, and referenced from other respected sources2.
The six schema markups that move the needle for AI citation
Schema markup is one of the most underused GEO levers because most SEO teams treat it as a Google rich-results play. For AI engines, schema is extraction food: structured data is exactly what LLMs need to parse and lift a page. The six schema types below give you the biggest GEO return per hour of implementation.
Article (with Person author)
The base layer. Article schema tells AI engines the page is editorial content with a real author. Adding a Person node for the author (with sameAs links to verifiable profiles) strengthens entity recognition.
FAQPage
Q&A pairs in structured form. The single most-cited schema across all four AI engines because FAQ answers map directly to user questions. Use it for every guide that includes a real FAQ section.
HowTo
Numbered step-by-step instructions in structured form. Cited heavily by ChatGPT and Claude for tactical queries. Use for any page describing a multi-step process.
DefinedTermSet
Glossary entries in structured form. Excellent for getting cited on definitional queries ("what is X"). Each term gets a separate citation opportunity.
Speakable
CSS-selector markup telling voice/answer engines which sentences to lift. Underused and high-leverage. Target your direct-answer sentences (TL;DR, section openers, FAQ answers).
Organization (with sameAs)
Your entity profile, linked to Wikipedia, Crunchbase, LinkedIn, or other authoritative entity sources via sameAs. Strengthens brand entity recognition across all AI engines.
A few implementation notes:
- Use multiple schema types per page. A good guide page can carry Article + FAQPage + HowTo + Speakable + Breadcrumb simultaneously. Each one is a different extraction surface for different AI engines.
- Link your Person and Organization entities. Author schema references the Organization via worksFor; Organization links to Wikipedia, Crunchbase, or other authoritative entity sources via sameAs. The entity graph you build internally informs how AI engines model your brand.
- Speakable is underused. Most sites don't implement Speakable. It tells voice and answer engines exactly which sentences to lift, which boosts citation odds for direct-answer content (TL;DR, section openers, FAQ answers). Trivial to add, meaningful upside.
The Reddit, Wikipedia, and YouTube amplification play
The most surprising finding in GEO research is how heavily AI engines lean on user-generated and community-validated sources. Reddit is the #1 cited source across every major AI engine. Wikipedia is in the top three for most. YouTube is rising fast. The implication: GEO isn't just about optimizing your own pages; it's about being present in the source ecosystem AI engines already trust.
The Reddit play
Reddit is the single most-cited source across ChatGPT, Claude, and Perplexity. The reason is structural: Reddit threads are answer-rich, frequently updated, community- ranked (so the best answers float to the top), and unlikely to be marketing fluff. AI engines treat upvoted Reddit comments as a strong consensus signal.
The Reddit play for GEO is not what most SEO teams do (drop links and run). It's the opposite. Identify 3 to 5 active subreddits in your domain. Contribute thoughtful answers to real questions over weeks and months. Establish a reputation as a knowledgeable voice without spamming your own site. The citation upside compounds because your username starts appearing alongside trusted answers, and your domain becomes referenced in those threads organically over time.
The Wikipedia play
Wikipedia accounts for roughly 7.8 percent of ChatGPT citations and similar shares across the other engines. A Wikipedia presence does two things for GEO: it makes your brand a recognized entity (which LLMs cite more readily), and it provides an authoritative anchor that other sources can reference. If your business is notable enough to qualify for a Wikipedia article (real news coverage, real third-party writeups), create one through Wikipedia's normal process. Don't self-edit; Wikipedia's editors will revert promotional edits and damage your entity signals.
The YouTube play
YouTube transcripts are increasingly used as source material by AI engines, particularly for how-to and explanatory queries. A YouTube channel where you actually talk about your domain provides a structured, attributed, transcribed source that LLMs can cite. The barrier to entry is lower than most B2B teams realize: even rough single-take explainer videos work if the transcript is clear.
The principle behind all three
AI engines scan for consensus before they confidently cite a brand. The more places your brand appears consistently across the web (Reddit threads, Wikipedia, YouTube, industry directories, your own site, partner blogs), the more confidently engines will surface you as an answer. The single most-overlooked GEO investment is the hours spent being present in the source ecosystem outside your own site.
llms.txt: what it is, who reads it, and whether to bother
llms.txt is a proposed markdown file at /llms.txt that lists a site's key resources for LLM consumption. Adoption is at about 10 percent of domains in 2026. Google has explicitly said it doesn't use llms.txt. Microsoft and OpenAI crawlers actively fetch llms-full.txt when present. The honest framing is 'low-cost high-asymmetric-upside experiment,' not 'guaranteed ranking lift.'
The llms.txt proposal mirrors the structure of robots.txt: a small, plain-text file at a predictable URL that tells AI crawlers what to read and in what order. The companion llms-full.txt provides full text content in machine-readable markdown for ingestion.
The current state of play in 2026:
- Adoption. About 10.13 percent of domains have published an llms.txt file. Up from near-zero a year ago but still niche4.
- Google. Has publicly stated it does not use llms.txt for AI features. Google's AI surfaces rely on standard crawling.
- OpenAI and Microsoft. Profound's crawler activity data shows OpenAI and Microsoft bots actively fetching both llms.txt and llms-full.txt files. Models from both companies appear to use the markdown content for live retrieval.
- Other engines. Adoption signals are mixed; the safest assumption is that more engines will start reading llms.txt over time as the standard becomes more established.
The pragmatic recommendation: publish both llms.txt and llms-full.txt. Cost to implement is trivial. Downside is essentially none. Upside is better fidelity in AI engine ingestion when the engines do read it. Treat it as a low-cost infrastructure bet, not a guaranteed visibility lift.
Our own /llms-txt and /llms-full-txt routes auto-generate from the site's guide registry, so adding a guide auto-publishes it for AI consumption. The mechanism is not magic; it's 30 lines of code that maps URLs to descriptions.
Crawler access: making sure AI engines can actually read your site
The first and easiest GEO mistake to fix: blocking AI crawlers in robots.txt. Many sites added blanket Disallow rules to GPTBot, ClaudeBot, PerplexityBot, and Google-Extended during the 2023-2024 anti-AI moment. Two years later, those same sites are invisible in AI citation results. Check your robots.txt before doing anything else.
The major AI crawlers and the user agents to whitelist:
- GPTBot (OpenAI): crawls the web for ChatGPT training and live retrieval.
- ClaudeBot (Anthropic): crawls for Claude's web search tool and training.
- PerplexityBot (Perplexity): crawls for Perplexity's real-time retrieval index.
- Google-Extended (Google): Google's opt-out signal for AI training. Allowing it permits Google to use your content for AI features.
- cohere-ai, anthropic-ai, ccbot: other AI crawlers worth allowing depending on which engines you want to be visible in.
The minimum-viable robots.txt for GEO is: don't block any of these user agents. If you have a legitimate reason to block one (privacy, copyright, competitive), it's a real trade-off: you opt out of citation in that engine. Most small businesses should allow all major AI crawlers as the default.
How to verify they're actually crawling you
Check your server logs for the user agents above. If you're using Cloudflare, the AI bot management dashboard shows which AI crawlers have requested which URLs. If you're not seeing any AI crawler hits, your site may be too small to be in the crawl queue yet, your robots.txt is blocking them, or your hosting is throwing 503 errors on bot requests. All three are fixable.
How to actually track whether your site gets cited
GEO without tracking is guessing. Three options at three price points: manual checks (free, 30 minutes a month), OtterlyAI ($29 per month, automated tracking across all major engines), and Profound (enterprise-tier with prompt-volume data). Most small businesses should start manual and graduate to OtterlyAI when AI search becomes a meaningful traffic driver.
Manual tracking (free, 30 minutes a month)
Pick 20 queries that matter to your business. Once a month, run each through ChatGPT (with web search on), Claude (with web tool on), Perplexity (default), and Google AI Mode (default). For each query, note: which pages get cited, which competitors get cited, which Reddit threads or Wikipedia entries get cited. Track the list in a simple spreadsheet over time.
Manual tracking is sufficient for businesses early in GEO. The patterns you spot (which engines cite you, which competitors keep showing up, which content formats win) are the same patterns automated tools surface, just with fewer queries checked.
OtterlyAI ($29/month and up)
Automated tracking of brand mentions and URL citations across ChatGPT, Perplexity, Google AI Mode, Google AI Overviews, Gemini, and Copilot. Includes audit reports and data exports. The lowest-friction starting point when manual tracking is no longer enough. Used by small and midsize teams to monitor their AI search visibility weekly without spending a person's time on it.
Profound (enterprise-tier)
Used by Fortune 500 firms for AI search intelligence with SOC 2 compliance. Its distinctive feature is Prompt Volumes, which show actual AI search demand by topic with demographic breakdowns. Think of it as keyword research for AI search: instead of guessing which queries matter in ChatGPT, you see the real volumes. Overpriced for small businesses; right tool for teams treating GEO as a strategic channel.
What to track over time
- Citation count per engine. How many of your tracked queries cite your pages each month.
- Citation share by competitor. Which competitors keep showing up, and on which queries.
- Source pattern. When you're NOT cited, what sources get cited instead. Reddit threads you should join? Wikipedia entries to update? Industry sites worth contributing to?
- Brand mention frequency. Are you being mentioned in answers without being cited (unlinked mentions)? Those count as consensus signals even without the link.
The 30-day GEO playbook for a small business
A focused 30-day sprint can establish a meaningful GEO baseline and start producing visible citations. The playbook below assumes one part-time GEO owner and a small content team. Compounding starts in cycle two or three.
Days 1 to 5: Baseline measurement
Pick 20 queries that matter to your business. Run each through ChatGPT (with web search), Claude, Perplexity, and Google AI Mode. Note which engines cite your pages, which cite competitors, and which cite Reddit / Wikipedia / industry sites. That's your starting map.
Days 6 to 12: Schema and structure upgrade
Audit your top 10 pages by traffic. Add Article + Person author + FAQPage + HowTo schema where applicable. Restructure section openers to lead with direct answers in the first 50 words. Add 2-5 outbound links to primary sources per page. Add an llms.txt file at /llms.txt.
Days 13 to 20: Source ecosystem work
Identify 3-5 active Reddit threads in your space. Contribute thoughtful answers (not links to your site) that establish you as a knowledgeable voice. Update your Wikipedia presence if relevant. Get listed on 2-3 industry directories that LLMs cite.
Days 21 to 27: Publish new content for the gaps
Write 2-3 new pages targeting queries where you noticed competitors getting cited but you weren't. Apply the cross-engine traits: 50-word direct answer per section, complete schema, real byline, recent date, outbound links. Submit to Search Console for fast indexing.
Days 28 to 30: Re-measure and decide
Re-run the same 20 queries through all four engines. Note new citations, ranking shifts, and any engines you're still invisible on. Decide what worked, what didn't, and which of the next 30-day cycle's queries to prioritize. The compounding starts here.
What this 30-day cycle produces: a real baseline of which queries cite which pages, schema and structural improvements on your top 10 pages, a meaningful presence in 3-5 active source-ecosystem threads, 2-3 new pages targeting competitive citation gaps, and the data to decide what cycle 2 should prioritize. Expect first visible citation lifts in cycle 2 (days 30-60) and compounding through cycle 4 (days 90-120).
What to do with this
Three paths depending on where you are. Audit your robots.txt and schema today, then start the 30-day playbook above. Read the broader SEO pillar to see how GEO fits with classic SEO. Or get an outside read on your specific GEO opportunities.
If you want the broader picture of how GEO fits inside SEO (Google AI Overviews, E-E-A-T, Helpful Content classifier, the whole stack), our AI SEO for small business pillar guide covers the full landscape.
If you're considering using AI to actually write the content the GEO playbook requires, our Will AI content hurt your SEO? guide covers what works, what gets penalized, and the recovery playbook for sites already hit.
If you'd rather have someone else audit your specific GEO posture and tell you which engines you're invisible on and how to fix it, our free 48-hour assessment sends a written read on your current AI citation footprint, the schema and structural gaps holding you back, and what performance terms we can offer to build and run the engine for you. No sales call.
Frequently asked questions
What's the difference between GEO and SEO?
SEO optimizes content to rank in Google's traditional search results. GEO optimizes content to be cited by AI engines (ChatGPT, Claude, Perplexity, Google AI Mode) in their generated answers. The work overlaps in fundamentals (quality, structure, authority) but diverges in execution: GEO weights structural clarity, source consensus across the open web, and entity recognition more heavily than backlinks or domain authority. About 80 percent of pages cited by major LLMs do not rank in Google's top 10 for the same query, which is the clearest evidence that the two surfaces evaluate differently.
Which AI engine should I optimize for first?
Optimize for the one your customers actually use. For most B2B small businesses, that's ChatGPT (largest user base) and Perplexity (highest-intent research traffic). For consumer-facing businesses, Google AI Overviews matter most because they appear directly in the Google SERP your customers already use. For developer or technical audiences, Claude usage is concentrated and worth optimizing for. The good news: the underlying work (structural clarity, schema markup, real authorship, original signals) compounds across all four. The engine-specific tactics layer on top.
Does my page need to rank in Google to be cited by ChatGPT?
No. The most counterintuitive finding in the GEO research is that about 80 percent of pages cited by ChatGPT, Claude, and Perplexity do not rank in Google's top 10 for the same query. The signals AI engines weight (structural clarity, source consensus, entity recognition, freshness) overlap with Google's signals but aren't identical to them. The practical implication: you can get cited by ChatGPT for queries you don't rank for, and you can rank #1 on Google for queries that AI engines never cite you on. Treat them as related but separate channels.
How does Perplexity decide which sources to cite?
Perplexity uses real-time retrieval over a proprietary index of more than 200 billion URLs. It prioritizes recent, community-validated, structured content with clear answer-first paragraphs. Its single biggest source bias is Reddit, which appears in 46.7 percent of Perplexity's top citations, largely because Reddit threads tend to be answer-rich and frequently updated. Official documentation and structured how-to content come second. To get cited by Perplexity: write answer-first paragraphs, keep content fresh (re-publish dates matter), use structured headers, and earn references in active Reddit threads in your space.
How does ChatGPT decide which sources to cite?
ChatGPT operates differently depending on whether you're asking a question that triggers live web search (then it cites sources) or a question answered purely from training data (then it doesn't cite at all). When citing, it favors consensus sources: Wikipedia (about 7.8 percent of citations), Reddit (about 12 percent), and competitor sites that appear consistently across the web on a topic. It also values page speed: pages with First Contentful Paint under 0.4 seconds average 6.7 citations versus 2.1 for slower pages. The implication is that ChatGPT visibility comes from being a consistent presence across multiple sources, not from a single perfectly-optimized page.
How does Claude decide which sources to cite?
Claude doesn't cite sources by default. It will when you ask it to and provide source material, or when it uses its web search tool. When citing, Claude favors content with structural clarity: bullet-pointed lists, clear definitions, technical documentation, PDFs, and whitepapers. Bullet-pointed structured content is about 30 percent more likely to be cited by Claude than the same information in paragraph form. The implication: if you want Claude citations, write in clearly structured sections with named concepts, FAQ blocks, and bulleted answer lists.
How does Google AI Overviews pick its sources?
Google AI Overviews use a hybrid signal: 54 percent of citations overlap with Google's top 20 organic results, but 48 percent come from URLs outside the top 100 organic results. Multi-modal content (text plus images, video, structured data) shows the strongest correlation with AI Overview inclusion (r=0.92 in 2026 research). The trigger rate is roughly 13 percent of US desktop searches, much higher for informational queries. The practical implication: classic Google ranking helps but isn't sufficient. To increase AI Overview inclusion, layer multi-modal content (custom images, video, schema) on top of pages already ranking on page 1 or 2.
Should I publish an llms.txt file?
Probably yes, but understand what it does and doesn't buy you. About 10 percent of domains have an llms.txt file in 2026. Google has publicly stated it doesn't use the file for AI features. OpenAI and Microsoft crawlers actively fetch llms-full.txt when present. The file is essentially a curated index pointing AI crawlers at your most important content in machine-readable form. Cost to implement: trivial. Upside: better fidelity in OpenAI and Microsoft training data and live retrieval. Downside: none, since adding it doesn't harm anything. The honest framing is 'low-cost high-asymmetric-upside experiment,' not 'guaranteed ranking lift.'
How do I track whether my site gets cited by AI engines?
Three options at different price points. Manual tracking: open ChatGPT, Claude, Perplexity, and Google AI Mode in incognito; run your top 20 queries; note which pages get cited. Takes 30 minutes a month. Free, sufficient for small businesses. Mid-tier tracking: OtterlyAI (starting around $29 per month) automatically tracks brand mentions and citations across all major AI engines, with audits and exports. Enterprise tracking: Profound (used by Fortune 500 firms for SOC 2 compliance) offers prompt volume data showing actual AI search demand by topic with demographic breakdowns. Most small businesses should start manual, graduate to OtterlyAI if AI search becomes a significant traffic driver.
How long does it take for a new page to get cited by AI engines?
Perplexity is the fastest: new content can appear in citations within hours or days because the engine uses real-time retrieval. ChatGPT live search picks up new pages on a similar timeline. ChatGPT's training-data answers don't include new content until the next model training cycle (months to a year). Claude is similar: live web tool finds new content quickly; training data lags. Google AI Overviews follow Google's normal indexing timeline plus AI evaluation, which means 2 to 8 weeks for most new pages. The fastest path to early LLM citations is publishing on Perplexity-friendly surfaces (your site plus Reddit) at the same time.
Sources
- 5W Releases AI Platform Citation Source Index 2026: The 50 Websites That Now Decide What Brands Are Visible Inside ChatGPT, Claude, Perplexity, Gemini, and Google AI Overviews. PR Newswire / 5W, 2026.
- ChatGPT, Claude, Perplexity, and Google AI Overviews: How Each Platform Cites Sources Differently. Discovered Labs, January 29, 2026.
- AI Citation Patterns: How ChatGPT, Claude, and Perplexity Choose Sources. Discovered Labs, 2026.
- llms.txt and AI Visibility: Results from OtterlyAI's GEO Study. OtterlyAI, 2026.
- LLMs.txt: The Complete Guide for SEO and AI Search (2026). Derivatex, 2026.
- AI Citation Tracking: How to Measure Citation Frequency Across ChatGPT, Perplexity, and Claude. Averi.ai, 2026.
- 10 Best AI Search Monitoring Tools in 2026. OtterlyAI, 2026.
- How AI Engines Decide What to Cite: Claude, ChatGPT, and Perplexity Explained. AIVO, 2026.
- Generative Parser dataset: where LLM citations come from. BrightEdge, 2025-2026.
- Top 12 LLM Tracking Tools for AI Visibility. Writesonic, 2026.
Free, no sales call
Get a free AI audit
Send your website URL and a few sentences about where you'd like to grow. We'll send back a written assessment within 48 business hours: where AI fits, what performance terms we can offer, and what the realistic upside looks like for you.