How ChatGPT, Perplexity, Gemini, Claude & Grok decide what to cite (2026 data)

Each AI engine searches the live web through a different backend and cites a different mix of sources — only about 11% of cited domains overlap between engines. Here is what the 2026 evidence says actually drives citations, per engine, and which popular GEO tactics the data refutes.

Published June 17, 2026 Last updated June 17, 2026Data current as of June 2026

AI search changes fast — what these engines cite can shift in hours, not months. This page reflects research current as of June 2026 and is reviewed quarterly. Every statistic below is dated to its source so you can judge how current it is.

Key takeaways

Each engine uses a different search backend: ChatGPT→Bing, Gemini→Google, Claude→Brave, Perplexity & Grok→their own.
The strongest measured correlates are off-site: YouTube mentions (r≈0.737, #1) and branded web mentions (r≈0.66–0.71, #2).
Gemini is the exception — it cites brand-owned domains far more than any other engine.
Schema markup does not drive citations (it's hygiene); llms.txt has ~zero measured effect.
Adding statistics, quotations and cited sources to content lifts citation likelihood 30–40%.

Every engine uses a different search backend

The single biggest mistake in GEO is treating "AI search" as one thing. ChatGPT runs on Bing, Gemini on Google, Claude on Brave; Perplexity and Grok run their own indexes. That is why a source cited by one engine is usually invisible to the others — and why per-engine work, not a single trick, is the job.

Engine	Search backend	What to prioritize
ChatGPT	Bing index	Bing ranking, listicles, directories (G2/Capterra), Wikipedia, YouTube
Gemini	Google Search	Brand-owned content depth, entity clarity, Google ranking, YouTube
Perplexity	Own index (Bing-supplemented)	Reddit, content freshness, niche specialist sources
Claude	Brave Search	Research/data-dense content, statistics, awards
Grok	Own crawler + X	X/Twitter presence, engagement velocity, social proof

Claude and Grok have a far thinner public evidence base than ChatGPT, Gemini and Perplexity — treat their guidance as lower-confidence.

What actually drives citations

The strongest measured signals are off your own site: a brand's own website is typically only ~5–15% of its AI citations — the rest is earned media and claimable third-party listings — and moving identical content onto third-party outlets lifted citations ~239% in a controlled test. The exception is Gemini, which leans on brand-owned domains more than any other engine — the one place on-site depth pays off most.

r = 0.737

YouTube mentions are the strongest measured off-site correlate of ChatGPT visibility.

Ahrefs Brand Radar correlation study · May 2026 (re-confirmed; orig. Dec 2025)

r ≈ 0.66–0.71

Unlinked branded mentions across many third-party sites — the #2 correlate, just behind YouTube.

Ahrefs Brand Radar correlation study · May 2026 (re-confirmed; orig. Dec 2025)

+30–40%

Adding statistics, quotations and cited sources to a page lifts its citation likelihood (GPT-3.5-era; directional).

Princeton GEO (KDD 2024) · 2024

87%

ChatGPT's cited URLs overlap Bing's top-10 — being in the Bing index is the entry ticket.

Seer Interactive SearchGPT↔Bing analysis · Feb 2025

−4.6%

Adding schema markup moved Google AI Overviews citations by this much (significant negative). Schema is hygiene, not a citation lever.

Ahrefs schema controlled study · Aug 2025 – Mar 2026

70–95%

Sites that block AI bots are still cited ~70% (retrieval-blocked) to ~95% (training-blocked) of the time — blocking has surprisingly little measured effect on citations.

BuzzStream / PPC.land citation study · Mar 2026

The GEO tactics the data refutes

A lot of GEO advice is folklore. These are the popular tactics that the current evidence does not support — we list them because being honest about what doesn't work is part of getting the rest right.

“Schema markup gets you cited.”
A controlled study of 1,885 pages found schema had no positive effect on AI citations — slightly negative on Google AI Overviews. Schema still helps entity clarity and traditional search, so keep it as hygiene; just don't expect citations from it.
“An llms.txt file improves AI visibility.”
An analysis of 300,000 domains found no correlation with citations, and a 2026 follow-up found 97% of llms.txt files get zero bot requests. Google explicitly declines to use it; Anthropic and Perplexity reportedly read it for agent/retrieval workflows, but with no measured citation lift. Harmless to add — just not a citation lever.
“GEO delivers a ~40% visibility boost.”
The widely-quoted 40% figure has not held up to replication. Treat blanket boost claims with skepticism.
“Domain authority, more pages, and press-release wires drive citations.”
All weak: domain authority correlates ~0.18–0.33, page count ~0.19, and wire-distributed press releases account for ~0.04% of citations (PR overall is rising but still around 1% of citations). Original, cited, fresh content beats volume.

Sources & dates

[1] Ahrefs Brand Radar correlation study — N=75,000 brands (DR>40, ≥800 monthly volume); Spearman correlation, uncontrolled — re-confirmed in Ahrefs' follow-up report · May 2026 (re-confirmed; orig. Dec 2025)
[2] Seer Interactive SearchGPT↔Bing analysis — ~100 queries / 500+ citations · Feb 2025
[3] Princeton GEO (KDD 2024) — N=10,000 queries; tested on GPT-3.5-era + Google search — directional · 2024
[4] Ahrefs schema controlled study — 1,885 pages + JSON-LD vs 4,000 controls · Aug 2025 – Mar 2026
[5] BuzzStream / PPC.land citation study — 4M AI citations, 3,600 prompts, 10 industries · Mar 2026
[6] SE Ranking llms.txt analysis — 300,000 domains; no measurable correlation · 2025 (reported)
[7] McKinsey / Muck Rack / Stacker×Scrunch (citation source mix) — A brand's own website is ~5–15% of its AI citations; the rest is earned media + claimable third-party listings (corroborated by 6+ vendors). +239% median citation lift moving identical content to third-party outlets — the one controlled study (Stacker×Scrunch) · 2025–26 (multi-vendor)

Correlational figures (e.g. Ahrefs r-values) describe association, not causation, and come from single-vendor datasets — treat them as directional. We refresh this page quarterly as the engines and the evidence base evolve.

See how AI engines currently describe your brand

S6S measures whether ChatGPT, Perplexity, Gemini, Claude and Grok mention, cite and recommend you — and shows the source gaps behind it. Run a free check, no signup.

Run a free audit How S6S works