Source: ai-research/similarweb-most-cited-domains-llms-2026-04-14.md — Maayan Zohar Basteker (Senior SEO Specialist, Similarweb), “The Most Cited Domains by LLMs and the Signals That Get You Cited,” published April 14, 2026, reviewed by Limor Barenholtz.

Author: Maayan Zohar Basteker (Similarweb) URL: https://www.similarweb.com/blog/marketing/geo/most-cited-domains-llms/ Published: 2026-04-14

A Similarweb study analyzing ~600,000 citation events from ChatGPT (web browsing) and Google AI Mode in the US over January–February 2026, ranking the top 20 domains each engine cites and extracting the structural patterns that earn citations. This is one of the largest publicly-disclosed LLM-citation datasets to date and confirms that AEO/GEO is now a measurable channel — not a hypothesis.

Key Takeaways

  • Wikipedia + Reddit ≈ 25% of all ChatGPT citations. Wikipedia leads at 13.15% (118,285 events) and Reddit follows at 11.97% (107,680). The two open-knowledge surfaces dominate.
  • Fandom outranks Wikipedia in Google AI Mode. Fandom.com is #1 at 7.16% vs Wikipedia at 5.21% — Fandom’s “thousands of words covering a single specific subject” structure beats Wikipedia’s encyclopedic breadth on Google’s surface.
  • Both engines self-cite aggressively. OpenAI is #3 in ChatGPT (6.21%) ahead of Reuters and NIH; google.com is #5 in Google AI Mode (2.85%). Engine self-citation is the second-strongest pattern after open-knowledge sites.
  • AI rarely cites homepages. “Most ChatGPT citations come from pages that are several folders deep within a site” — blog posts, FAQs, how-to guides, product detail pages, research articles.
  • Depth on a specific topic beats breadth every time. Fandom outranks Wikipedia for entertainment queries because it goes deeper on individual subjects. The same logic suggests a well-structured product page can outrank Amazon for niche product questions.
  • Both engines lean on community and trust. Reddit, Facebook, Instagram, Quora collectively account for nearly 9% of Google AI Mode citations. Government/institutional sources (NIH, Cleveland Clinic, IRS) earn consistent citations as trust signals.
  • The two surfaces have different personalities. ChatGPT favors commerce + news + business services. Google AI Mode favors entertainment + UGC + social. Same domain can rank wildly differently across engines (LinkedIn #6 in ChatGPT, #15 in AI Mode).
  • Citation overlap is low and citation sources change frequently. Optimization is not “rank once and forget” — it requires ongoing monitoring across both engines.
  • ChatGPT is the larger citation surface. Wikipedia in ChatGPT generated 118,285 events vs Fandom’s 42,332 in AI Mode — ChatGPT carries materially more citation volume for top-tier domains.
  • News + reviews + business services + social are the four largest citation categories, in that order, on both engines.

Top 20 Most-Cited Domains in ChatGPT

#DomainCitations%
1wikipedia.org118,28513.15%
2reddit.com107,68011.97%
3openai.com55,8766.21%
4walmart.com26,1182.90%
5youtube.com23,9762.67%
6linkedin.com21,7362.42%
7reuters.com20,4512.27%
8nih.gov19,9622.22%
9google.com19,4782.17%
10media-amazon.com17,4771.94%
11wikimedia.org17,3291.93%
12facebook.com15,8131.76%
13ebay.com15,7301.75%
14amazon.com15,3431.71%
15github.com14,5691.62%
16apple.com13,3421.48%
17yahoo.com12,9421.44%
18forbes.com12,4541.38%
19fandom.com11,6301.29%
20squarespace-cdn.com11,5751.29%

Top 20 Most-Cited Domains in Google AI Mode

#DomainCitations%
1fandom.com42,3327.16%
2wikipedia.org30,7925.21%
3youtube.com29,0324.91%
4reddit.com24,7644.19%
5google.com16,8782.85%
6facebook.com14,4102.44%
7amazon.com10,8861.84%
8nih.gov8,8211.49%
9github.com8,7281.48%
10apple.com8,1271.37%
11instagram.com7,0071.19%
12microsoft.com6,9501.18%
13quora.com6,1301.04%
14ebay.com5,6410.95%
15linkedin.com5,4790.93%
16imdb.com4,7840.81%
17clevelandclinic.org4,5290.77%
18irs.gov4,4850.76%
19walmart.com4,0680.69%
20medium.com3,9700.67%

Methodology

  • Window: January–February 2026 (two months).
  • Volume: ~600,000 citation events.
  • Engines: ChatGPT (web browsing mode) + Google AI Mode.
  • Geography: United States.
  • Definition: A “citation event” = an AI engine references a domain in a generated response.
  • Tool: Similarweb’s AI Citation Analysis tracking monitored prompts.

The dataset is large and engine-pair-comparable, but it’s prompt-set dependent — Similarweb monitored a fixed prompt set. The absolute numbers reflect Similarweb’s prompt distribution, not all human queries to those engines.

Pattern Analysis

Why Fandom beats Wikipedia on Google AI Mode

Per the article: “Fandom pages are structurally optimized for exactly what AI engines prefer: individual pages run to thousands of words covering a single specific subject.”

Wikipedia pages tend to be broader summaries; Fandom pages are deep, single-subject, fan-maintained. For entertainment queries (where Google AI Mode is heavily used), depth on the specific subject wins over encyclopedic coverage. The same shape — one URL, one subject, thousands of words — should generalize beyond entertainment.

Why community platforms cluster high

Reddit (#2 ChatGPT, #4 AI Mode), Quora (#13 AI Mode), Fandom (#1 AI Mode) — UGC platforms are top-cited because they aggregate “real opinions, comparisons, and validation.” AI uses them heavily for subjective or experience-based prompts where a brand-published page would be biased.

This validates the Devesh Paliwal Reddit playbook claim that ~11% of AI citations come from Reddit — Similarweb’s ChatGPT number for Reddit is 11.97%, almost exactly Devesh’s stat.

Why depth beats homepages

“AI rarely cites homepages.” Citations come from pages 3+ folders deep — blog posts, FAQs, how-to guides, product detail pages. The optimization implication is that brand SEO investment should shift away from polished hero pages and toward the long tail of single-subject answer pages.

This is structurally identical to the FLUQs thesis: resolve a specific friction-inducing question in a citable EchoBlock format, and you become the source the LLM extracts. The Similarweb data is the empirical evidence behind FLUQs’ qualitative argument.

Citation category mix

Both engines, in rough rank order: News → Reviews → Business services → Social platforms.

  • ChatGPT skews toward business services + ecommerce.
  • Google AI Mode skews toward reviews + UGC + social.

Implication: a B2B SaaS audit-tool brand should prioritize ChatGPT-friendly surfaces (deep how-to guides, comparison pages, product-detail pages); a consumer entertainment brand should prioritize Google-AI-Mode-friendly surfaces (community-style content, fan wikis, deep niche pages).

Engine self-citation

OpenAI #3 in ChatGPT, google.com #5 in Google AI Mode. Both engines preferentially cite their own properties when those properties contain relevant content. This is not actionable for non-engine brands but it does eat ~6-8% of citation share off the top.

What Gets Cited (extracted formula)

Patterns from the four leaders:

DomainWhat it does well
WikipediaDirect, factual question answers
RedditReal opinions and comparisons
YouTubeVisual explanation
FandomGoes deep on specific niches

Core formula: Depth on a specific topic beats breadth. The implication for brand sites is that a well-structured, niche-deep product or topic page can compete with general retailers and encyclopedias on the questions that page is built to answer.

Try It

  • Audit your top citation surface today. Open ChatGPT (browsing on) and Google AI Mode. Ask 10 prompts a buyer would ask — does your domain show up at all? Whose does?
  • Pick one buyer question your category isn’t well-covered on yet. Write the deepest single-subject page on the internet for that question. 1,500+ words, structured H2/H3, FAQ section, real numbers, opinion-bearing voice. Publish 2-3 folders deep on your site (/blog/specific-subject/ not /).
  • Cite real sources. AI engines weigh trust. A page citing NIH, IRS, Cleveland Clinic, peer-reviewed papers, or government data inherits some of that trust transitively.
  • Show up on Reddit deliberately. If 11.97% of ChatGPT citations are Reddit, brand-adjacent comments in the right subreddits are higher-leverage than a brand blog post. See the Reddit + AI-citation playbook for the full operator workflow.
  • Don’t optimize for both engines identically. Map your queries to engine. Commerce + B2B + news → optimize for ChatGPT. Entertainment + UGC + niche fandom → optimize for Google AI Mode.
  • Monitor monthly. “Citation sources change frequently, and overlap between AI platforms is low.” This is not a one-time audit — re-run the same 10 prompts every month and track changes.

Cross-Reference: How This Article’s Numbers Validate Existing Wiki Theses

  • FLUQs argues that LLM citations are the new ranking signal because content survives compression into AI answer boxes. Similarweb’s data is the empirical behind that argument — 600K citation events, top-20 tables, deep-page preference, all consistent with the FLUQs thesis that “reuse, not position” is the metric.
  • Reddit playbook (Devesh Paliwal) claims “~11% of AI citations come from Reddit.” Similarweb’s number for Reddit on ChatGPT is 11.97% — the playbook’s stat is essentially correct.
  • seo-patterns-learned (internal)|seo-patterns-learned (internal) captures internal pattern observations from the WEO Marketly content stack. The Similarweb depth-beats-breadth finding is a candidate for incorporation into the next iteration of that pattern set.

Implementation

Tool/Service: Similarweb AI Citation Analysis (vendor-paid; this article is the marketing artifact for it). Setup: Similarweb account → AI Citation Analysis module → define competitor set → ingest monitored prompts → review citation source/URL reports. Cost: Not disclosed in the article. Similarweb is enterprise-priced; expect $1k+/month minimum for this module. Integration notes: The five-step process (define goals → identify missing prompts → analyze sources → analyze URLs → integrate traffic data) is the SOP Similarweb wraps around the data. The data itself is what’s reusable — the SOP is generic enough to recreate inside any AEO platform that exposes domain + URL-level citation tracking (Profound, Otterly, Ahrefs Brand Radar, etc).

Open Questions

  • Prompt-set bias. The 600K events came from Similarweb’s “monitored prompts” — what mix of categories did those prompts represent? A heavy entertainment skew would inflate Fandom; a heavy commerce skew would inflate Walmart. The article doesn’t disclose the mix.
  • Geographic scope. US-only. Non-English LLM citations and non-US engine surfaces (Baidu, Perplexity-Mexico, etc) likely have completely different domain distributions.
  • Engine version. ChatGPT and Google AI Mode shipped multiple model upgrades during Jan-Feb 2026. The article doesn’t break down citation source by underlying model version.
  • Causality vs correlation. Domains in the top 20 are cited heavily; that doesn’t necessarily mean their structural choices cause the citations vs the engines having a learned bias toward those domains independent of structure.
  • Update frequency. The article was published April 14 with Jan-Feb data — a 6-8 week lag. Similarweb publishes new editions on what cadence? (Article doesn’t say.)
  • Squarespace CDN at #20 ChatGPT. squarespace-cdn.com showing up suggests image/asset citation — does Similarweb count an inline image embed as a citation? Methodology silent.