AirOps + Kevin Indig "Fan-Out Effect" ChatGPT Citation Study (2026-04-13)

Source: airops-fan-out-effect-citation-study-2026-04-13.md — AirOps Team in partnership with Kevin Indig (Growth Memo). Published 2026-04-13 on airops.com/report.

AirOps + Kevin Indig ran an observational study at unusual scale on ChatGPT’s citation behavior: 16,851 unique queries, 50,553 ChatGPT responses (3 runs each), 353,799 pages scraped, 815,484 scoring rows. The headline finding — retrieval rank dominates everything else: rank-1 pages are cited 58.4% of the time, rank-10 pages 14.2% (a 4.1× gap). On-page signals (schema markup, heading match, word count, FK readability) all matter, but at +5-11 percentage point magnitudes — meaningful but secondary to “did the retrieval system put your page in the candidate set?” Domain authority shows no positive correlation in ChatGPT-only data. This is the deepest public dataset on ChatGPT citation mechanics to date.

Key Takeaways

Retrieval rank is the dominant signal — by a factor of 4×. Rank 1: 58.4% citation rate. Rank 10: 14.2%. Winning the retrieval candidate set matters more than any individual on-page signal.
Query fan-out is real but bounded. 88.6% of ChatGPT queries trigger exactly 2 fan-out sub-queries. 8.8% trigger zero. Only 2.5% trigger 4+. The popular “1 query → 6-10 fan-out queries” narrative is overstated for ChatGPT’s actual behavior in this dataset.
Heading-to-query similarity matters (+10.8pp). Pages where H1-H4 headings match the user’s query at cosine similarity ≥0.90 are cited 41.0% vs 30.2% for pages at <0.50.
JSON-LD schema shows a +6.5pp citation advantage in AirOps’s stratified analysis (38.5% vs 32.0%, independent of word count / heading count / DA / query-match score). Top types: MedicalWebPage (47.0%), BreadcrumbList (46.2%), FAQPage (45.6%).
Schema finding is correlational — pairs with the Ahrefs causal contradiction. AirOps’s stratification controls don’t isolate schema the way Ahrefs’s matched difference-in-differences does. Both findings reconcile if schema is a marker of editorial/technical maturity, not a causal lever. See contradiction callout below.
Focused beats exhaustive coverage (+4.2pp). Pages covering 26-50% of fan-out sub-queries beat pages covering 100%, controlling for primary similarity ≥0.8. Over-covering the topic is a drag on citation, not a help.
Word count sweet spot 500-2,000. Pages over 5,000 words underperform pages under 500.
FK readability 16-17 (college) wins (+6.3pp over FK <8). College-level writing beats both simple and overly academic text.
Domain authority shows no positive correlation in ChatGPT. Slight inverse trend in the highest DA quartile. This directly contradicts Digital Applied’s AIO finding of DA Pearson +0.61. The likely explanation: ChatGPT and Google AIO weight authority differently; ChatGPT leans heavily on retrieval system signals, AIO weights traditional Google ranking signals (which include DA-correlated factors) more.
Page age: fresh content wins (+5.2pp). 30-89-day-old pages cited 32.8%, 5+-year-old pages cited 27.6%.
ChatGPT-only scope. Authors explicitly flag that findings may not transfer to Google AI Mode, Perplexity, or Gemini. The retrieval-rank dominance is system-specific.

Schema effect: AirOps says +6.5pp, Ahrefs says no causal lift

AirOps says (this article, correlational, ChatGPT, stratified controls) — JSON-LD schema independently predicts +6.5pp citation rate (38.5% vs 32.0%). Ahrefs says (ChatGPT) — adding schema mid-period produces no statistically meaningful citation lift on any AI surface. Reconciliation: Correlational evidence shows schema-using pages are cited more; causal evidence shows adding schema doesn’t cause the lift. Most parsimonious interpretation: schema is a marker of editorial / technical / publication-infrastructure maturity that correlates with citation, not the lever itself. AirOps’s stratification controls for word count / heading count / DA / query-match but cannot match on unobserved publisher quality dimensions. Both can be simultaneously true. Status: resolved (2026-05-19) — methodological-difference, not factual.

Methodology

Scale: 16,851 queries × 3 ChatGPT runs = 50,553 responses, 353,799 pages scraped, 815,484 scoring rows.
Capture: UI scraping (not API).
Embedding model: BAAI/bge-base-en-v1.5, 768 dimensions, page H1-H4 vs query embeddings.
Design: Observational with stratification controls (e.g., holding primary similarity constant when isolating other signals). NOT a matched difference-in-differences or randomized assignment design.
Engine: ChatGPT only.

Practical Implications

The retrieval-rank finding (4.1× gap between rank-1 and rank-10) is the dominant practical takeaway: the page-level signals AirOps measures are “table stakes for AI visibility, not differentiators.” None of them can overcome poor retrieval rank or weak query-page relevance. The practitioner playbook AirOps proposes:

Optimize the page’s retrieval candidacy first — ensure indexability, quality content, semantic relevance. Don’t ship schema before fixing retrieval.
Match headings to query intent — write H1-H4 that literally answers the query at high cosine similarity.
Use focused content, not exhaustive coverage — covering 26-50% of fan-out sub-queries deeply beats covering 100% shallowly.
Word count 500-2,000, FK grade 14-17, 4-10 H2-H4 headings — the sweet spots for articles.
Ship JSON-LD schema — at minimum FAQPage / BreadcrumbList / Article for editorial pages, MedicalWebPage for healthcare. Ship for the parseability benefit AirOps measures, but don’t treat schema as a citation lever in isolation — Ahrefs’s causal study shows adding it to a page doesn’t cause the lift.

Open Questions

Why does ChatGPT diverge from AIO on DA? ChatGPT shows no positive DA correlation (slight inverse). Digital Applied’s AIO study shows DA Pearson +0.61. Two engines, opposite findings on the same signal. Hypotheses: ChatGPT relies primarily on the underlying retrieval system (publicly Bing) which weights DA-proxies less; AIO inherits Google ranking signals which encode DA proxies heavily.
Schema causation vs correlation. Pairs of methodologies (AirOps stratification vs Ahrefs matched DiD) disagree directionally on whether schema is causal. Resolution likely requires a randomized intervention study (impractical at scale).
2-step fan-out. Why does ChatGPT preferentially fan out to exactly 2 sub-queries 88.6% of the time? Authors don’t speculate. ^[inferred] Likely a configured-not-emergent parameter.

Ahrefs Schema → AI Citations Causal Study — Matched DiD on 1,885 pages adding schema. Causal counterpoint to AirOps’s correlational +6.5pp finding.
Zyppy AI Citation Ranking Factors Meta-Analysis — Cyrus Shepard’s 54-study aggregation. Lists Search Rank #2 (9.7/10) and Fan-out Rank #3 (9.3/10) — AirOps is the deepest single empirical backing for that #2 + #3 weight.
Digital Applied 1,000 AIO Citation Pattern Study — Companion correlational study on AIO. Schema lift 2.3× there vs +6.5pp here; the magnitude difference is the AIO-vs-ChatGPT divergence.
GEO-16 Framework (arXiv 2509.10762v1) — Academic correlational study on Brave/AIO/Perplexity. Structured Data r=0.63 corroborates schema-correlated-with-citation across engines.
Google’s Generative AI Search Optimization Guide — Google’s official position aligns: AI Overviews + AI Mode use the same Search index, so winning retrieval rank IS winning AI citation candidacy.
FLUQs Framework — Citation Labs’ content-strategy framework. AirOps’s heading-match-and-focused-coverage findings give the empirical backing for FLUQs’s “structure facts to survive LLM compression” thesis.
GSC Autonomous SEO Engine (internal) — Operationalizes the retrieval-rank-first playbook AirOps validates.

Try It

Pull ChatGPT’s retrieval rank for your top 20 queries. If you’re not in the top 10 candidate set, no on-page tactic will rescue you. Fix retrieval candidacy first.
Audit H1-H4 cosine similarity against your target queries. Use any sentence-transformer model (BAAI/bge-base-en-v1.5 is what AirOps used) to score heading-to-query similarity. Rewrite headings under 0.50.
Resist over-coverage. Cover 26-50% of the fan-out sub-queries deeply; don’t try to cover all of them. Use AI Mode’s expanded query view to see what the fan-out looks like.
Word-count audit. Trim pages over 5,000 words to 2,000-3,000. Length is a drag past the sweet spot.
FK readability check. Most enterprise SEO content sits at FK 10-13. Push to 16-17 for AI-citation eligibility.
Schema as table stakes, not lever. Ship FAQPage / Article / BreadcrumbList / HowTo where editorially valid. Don’t expect schema alone to move citations — the causal evidence (Ahrefs) is against it.

Refresh — AirOps “From Retrieved to Cited” commercial-content companion (added 2026-05-19)

AirOps published a commercial-content companion to this study — “From Retrieved to Cited: How Commercial Content Earns Citations in AI Search” (airops-from-retrieved-to-cited-2026-05-19.md). Where the April 13 study mapped retrieval→citation mechanics, this one holds retrieval constant and asks which page structures earn citations at each buyer-journey stage (awareness → consideration → comparison → validation). The two reconcile cleanly: retrieval rank gates candidacy; structure decides selection.

Comparison pages with 3 tables earn +25.7% more citations — the single strongest lift in the study. Versus / competitor-comparison pages relying on prose underperform; structured tables (pricing, features, limitations, tradeoffs) give AI search a cleaner side-by-side format. Audit these first.
Validation pages with 8 list sections earn up to +26.9% more citations. More broadly, commercial pages with 7-26 list sections were +6% to +15.2% more likely to be cited — lists are the strongest shared signal across every journey stage.
Early-discovery / awareness pages with 5-7 statistics earn +20% higher citation likelihood. Grounding category-introduction claims in data gives AI search more confidence to cite.
Shortlist pages averaging ≤10 words/sentence earn +18.8% more citations; pages averaging 11-14 words/sentence earn ~+7%. Reinforces the FK-readability and “easy to read, parse, extract” findings from the main study.
Through-line: content that is easier to read, parse, and extract performs better at every stage. The lever is structure (tables, lists, short sentences, inline stats), not length — the commercial-page-specific corroboration of this study’s “focused beats exhaustive” finding.

Jonathon's AI Wiki

Explorer

AirOps + Kevin Indig "Fan-Out Effect" ChatGPT Citation Study (2026-04-13)

Key Takeaways

Methodology

Practical Implications

Open Questions

Try It

Refresh — AirOps “From Retrieved to Cited” commercial-content companion (added 2026-05-19)

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

AirOps + Kevin Indig "Fan-Out Effect" ChatGPT Citation Study (2026-04-13)

Key Takeaways

Methodology

Practical Implications

Open Questions

Related

Try It

Refresh — AirOps “From Retrieved to Cited” commercial-content companion (added 2026-05-19)

Graph View

Table of Contents

Backlinks