Source: sistrix-ai-citation-drift-2026-05-19.md — Johannes Beus. Published 2026-05-01 on SISTRIX Blog.
SISTRIX ran the most methodologically rigorous longitudinal AI citation study extant: 82,619 qualified prompts producing 1,548,213 snapshots across 6 countries (DE, US, UK, IT, ES, FR), 3 platforms (Google AI Overviews, Google AI Mode, ChatGPT Search), and 17 consecutive weeks (Dec 17 2025 – Apr 8 2026). The headline framework — “fixed core + carousel” — reframes the entire GEO question from “am I cited?” to “am I in the core or in the carousel?” Google AI Mode rotates 56% of cited domains every week; ChatGPT rotates 74%. AI Overviews are bimodal — 53% of prompts have zero source changes across all 17 weeks, while 19% drift as much as AI Mode. The three platforms are operationally three different citation architectures, and aggregating them obscures more than it reveals.
Key Takeaways
- Citation drift is the dominant temporal signal. Google AI Mode rotates 56% of cited domains every week. ChatGPT Search rotates 74%. AI Overviews replace approximately 0 sources for 53% of prompts and roughly half for the volatile 19%. Single-snapshot “am I cited?” measurements are not reproducible — drift is a structural feature, not a transient effect.
- Every AI Mode response has a fixed core + carousel. 86.5% of prompts have a stable core of 1-5 domains that persist across weeks/months. The remaining ~10 domains rotate at 89% per week. GEO measurement should treat core membership and carousel rotation as separate phenomena.
- Three platforms, three citation architectures. AI Overviews = closed circle (11 domains, 8 stable, 53% of prompts have ZERO changes). AI Mode = fixed core + carousel (14-16 domains, 56% weekly rotation). ChatGPT Search = total fluctuation (3-4 domains, 74% weekly rotation, median prompt has zero domains stable across all 17 weeks). Lumping them together as “AI citations” is methodologically false.
- Brand domains are anchored, co-citations rotate. The brand’s own domain appears in 43% of brand queries across all 17 weeks; in 66% it’s present in ≥80% of weeks. The 12-15 co-citing domains rotate at 70% per week. Brand presence is durable; the space alongside the brand is reallocated weekly.
- News is a one-way ticket. News/Media domains have a core rate of 1.4% — practically unsuitable as citation targets. Evergreen product/shop pages and YouTube (24% core rate) dominate the durable-citation tier.
- Citation drift is global and persistent. Drift rates remain 54-59% across all 6 countries with no stabilization trend across the 17-week window. This is not an early-platform transient — it’s a structural feature of how these systems select sources.
- Engine-divergence is also temporal. AI Overviews and AI Mode (both Google products) cite different domains 83% of the time for the same prompt (Jaccard index 0.17). AI Mode vs ChatGPT Jaccard = 0.125. Platform-specific citation strategies are now a requirement, not an option.
- Conservative measurement framing. Domain-level drift (74% ChatGPT) is reported; URL-level drift is 85%. Even when a domain persists, the specific cited URL frequently swaps. Domain presence is the only reliably measurable / controllable metric — chasing individual URL ranks in AI surfaces is a misuse of effort.
- E-E-A-T profiles differ by engine. AI Mode core = 80% German-language, 85% evergreen, product/shop-dominated. ChatGPT core = 68% English-language (even for German queries), documentation/institutional-dominated, E-E-A-T 16/20 vs peripheral 14/20. The two engines reward fundamentally different content profiles.
- Methodologically the strongest longitudinal study in the cluster. 17 weeks × 6 countries × 3 platforms × 82,619 prompts × 1.5M snapshots × 2,556 URLs E-E-A-T-classified. No prior public study has tracked AI citations week-over-week at this duration or breadth.
Methodology
The methodological rigor here exceeds every other study in the AI SEO cluster on the temporal dimension:
- Scale: 82,619 qualified prompts producing 1,548,213 weekly snapshots.
- Duration: 17 consecutive weeks (Dec 17 2025 – Apr 8 2026), with weekly reference dates.
- Geographic breadth: 6 countries (Germany, USA, UK, Italy, Spain, France) — testing whether citation drift is a US/English-specific phenomenon or structural across languages.
- Platform breadth: 3 platforms (Google AI Overviews, Google AI Mode, ChatGPT Search) — the first study to longitudinally compare all three head-to-head.
- Core metrics:
- Churn-in — proportion of cited domains in week N that did not appear in week N-1 (new arrivals).
- Retention — proportion of week N-1 domains carried into week N (survivors).
- Core — domains with churn-in ≈ 0% across the full window (permanent residents).
- Peripheral/Carousel — domains that appear, disappear, and rotate weekly.
- URL classification: 2,556 cited URLs classified by Gemini-based NLP across 5 dimensions: language, content type, monetization model, evergreen status, E-E-A-T signals — providing the qualitative profile of “what survives in the core.”
- Verification: For prompts classified as “fully stable” in AI Overviews, the SISTRIX team verified that the generated response text changes week-to-week even when the cited domain set does not. This rules out measurement artefact — AI Overviews genuinely “writes new sentences from the same 8 sources.”
- Conservatism flagged: Domain-level drift figures are reported throughout. The actual URL-level drift is ~15 percentage points higher (74% → 85% for ChatGPT). The team made the deliberate methodological choice to report the more controllable metric.
This is the only public study to date that establishes whether citation drift is a structural feature (it is — persistent across 17 weeks and 6 countries) or a platform-rollout transient (it isn’t — no stabilization trend observed across the window).
The Fixed-Core + Carousel Framework
The single most useful conceptual contribution of the SISTRIX study is the fixed-core + carousel framework. For 86.5% of all AI Mode prompts, the citation set decomposes into two operationally distinct layers:
- Core (1-5 domains): Permanent residents. Cited week after week, regardless of which other sources come and go. Their weekly churn-in is approximately 0%. These domains have effectively “won” the topic — they are baseline expectations for the LLM’s source-selection process.
- Carousel (~10 domains): Rotators. 89% of these are different week-over-week. They appear briefly, are quoted once, and are replaced. The carousel is reshuffled almost completely on a weekly cadence.
This reframes the GEO measurement question completely. The old “am I in the citation set?” framing collapses two fundamentally different success states:
- Core membership — “I am persistently part of the topic” (durable, reproducible, strategically valuable).
- Carousel appearance — “I happened to be sampled this week” (transient, snapshot-only, not reproducible by repeating the same prompt next week).
Treating these as a single binary metric (“cited / not cited”) is the central measurement error in single-snapshot GEO reporting tools. The carousel layer creates the illusion of optimization wins that vanish on re-measurement. The core layer is where actual durable visibility lives.
Strategically: optimizing into the core requires a fundamentally different bar than optimizing into the carousel. Carousel appearance is achievable with on-page tactical work (heading-to-query match, schema, evergreen framing — see AirOps Fan-Out Effect Study). Core membership requires the kind of topical authority that competing sources cannot displace in a given window — domain-level reputation, breadth of coverage, evergreen reliability, and likely some retrieval-system-internal preference signals that SISTRIX did not measure directly. ^[inferred]
Three Platforms, Three Architectures
Google AI Overviews — the closed circle. Average 11 cited domains; 8 remain consistently present. But stability is bimodal, not uniform:
| Prompt segment | Weekly source changes | Share of prompts |
|---|---|---|
| Fully stable | 0 sources change across 17 weeks | 53% |
| Light drift | Isolated changes | 28% |
| Heavy drift | 46% churn-in (matches AI Mode) | 19% |
The verification check — that 87% of “fully stable” prompts have changing response text but identical cited domains — confirms this is not a measurement artefact. AI Overviews writes new sentences each week from the same 8 sources. “Once you’re in, you stay in. If you’re not in, you don’t get in either” is a defensible operational summary.
Google AI Mode — fixed core + carousel. 14-16 cited domains per response, 56% churn-in/week, 86.5% of prompts have a 1-5 domain stable core. The core/peripheral architecture described above applies. AI Mode is the canonical case for the SISTRIX framework — it is where the core/carousel pattern is most cleanly visible.
ChatGPT Search — total fluctuation. 74% weekly churn-in. 3-4 cited domains per response (vs 14-16 in AI Mode). The median prompt has zero domains stable across all 17 weeks — by contrast, the median AI Mode prompt has ~2 stable core domains. Stable core structures barely exist on ChatGPT in this study. And language preference inverts: even for German-language queries, 68% of ChatGPT Core sources are English-language.
AI Mode churn by country:
| Country | Churn-in/week | Retention/week | Domains/answer |
|---|---|---|---|
| DE | 56% | 46% | 13.5 |
| US | 54% | 49% | 16.1 |
| UK | 54% | 49% | 15.7 |
| IT | 59% | 43% | 13.7 |
| ES | 56% | 44% | 11.8 |
| FR | 57% | 43% | 12.3 |
Country variation is narrow (54-59% range). ChatGPT, by contrast, varies dramatically: Germany at 74% (most volatile), UK at 60%, France at 42% (most stable). SISTRIX attributes this to differences in crawl coverage for different languages — ChatGPT has more candidate sources to rotate through in heavily-crawled language markets, fewer to rotate through in less-crawled ones. ^[inferred] This implies that ChatGPT’s apparent “stability” in lower-resource languages is a side effect of small candidate pools, not a sign of better source selection.
Domain-Type Hierarchy
Among AI Mode citations, the durable-vs-disposable hierarchy is sharp and counterintuitive:
| Domain type | Median appearance | Core rate |
|---|---|---|
| Video (YouTube) | 53% | 24% |
| Big Tech (Google, Apple, Microsoft) | 41% | 16% |
| Wikipedia | 24% | 12% |
| Marketplace (Amazon, Otto, Zalando) | 18% | 10% |
| News / Media | 12% | 1.4% |
| Forums / UGC (Reddit, Gutefrage) | 12% | 3% |
YouTube is the clear winner. Cited on more than half of all weekly snapshots and present in the stable core at 24% — the highest rate of any content type. The implication for content strategy is stark: video content, particularly YouTube video, is structurally favored by AI Mode in a way that text-only content is not. This is consistent with Google’s broader 2024-2026 push toward video-as-citation across both classic Search and AI Search. ^[inferred]
News is a one-way ticket. Core rate of 1.4%. News articles are quoted once and gone the following week — they cannot anchor citation strategy. Editorial publishing that depends on AI citation traffic is structurally fighting the platform’s preference for evergreen content. The investment economics of “ship news → get cited → drive traffic” do not work when 98.6% of news citations evaporate within a week.
ChatGPT has no clear hierarchy. SISTRIX explicitly notes the equivalent table for ChatGPT shows minimal differences between domain types — even intuitively-favored sources (like documentation, big-tech) don’t systematically appear in stable core more than others. ChatGPT’s core composition is shaped by language and E-E-A-T more than by domain type.
The 2,556-URL classification produced two clear engine-specific profiles:
- AI Mode core profile: 80% German-language (peripheral: 62%), 85% evergreen (peripheral: 77%), product/shop pages dominate, guides/news rotate through.
- ChatGPT core profile: 68% English-language (even for German queries), E-E-A-T 16/20 (peripheral 14/20), documentation/institutional sources dominate, editorial guides rotate.
Two engines, two fundamentally different content-quality definitions. A page optimized for one is meaningfully different from a page optimized for the other.
Brand Anchoring vs Co-Citation Drift
The brand-query analysis isolates the strongest pre-existing trust signal — your own domain on a query about you — and measures how much even that signal drifts:
- Brand domain present in all 17 weeks: 43% of brand queries.
- Brand domain present in ≥80% of weeks: 66% of brand queries.
- Co-citing domains (12-15 alongside brand) weekly rotation: 70%.
- Brand-query churn vs average: 45% vs 56% — 20% more stable, not dramatically different.
Two findings here. First, even for brand queries, the brand’s own domain drops out of the citation set in over half of all weeks for the majority of queries. The intuition that “my brand will obviously be cited on my brand query” is empirically wrong at any specific weekly snapshot — it holds in aggregate over time, not in any given measurement.
Second, the space alongside the brand — who gets cited as a co-source — is reshuffled at near-baseline rates (70% weekly rotation). Competing brands, comparison sites, review aggregators, news commentary, and editorial how-tos all rotate through that co-citation slot. This is the strategically interesting layer: brands cannot control their own anchoring (the LLM either does it or doesn’t), but they can compete for the co-citation layer alongside competitors’ brand queries — and that layer turns over weekly with 70% churn.
What This Means for GEO Strategy
SISTRIX offers three direct strategic implications:
1. Evergreen content before news, product pages before how-to guides. The core-rate hierarchy is unambiguous: video (24%) > big-tech (16%) > Wikipedia (12%) > marketplace (10%) > forums (3%) > news (1.4%). Evergreen product/shop pages systematically out-survive news in AI Mode citation sets. This inverts the conventional SEO content-velocity model where fresh news drives ranking momentum — in AI citation, news is the worst possible investment for durable presence. The closest historical parallel is Wikipedia’s resistance to recentism on encyclopedia entries: evergreen formats win in any system that values stability over novelty.
2. Set your platform focus based on your starting point. AI Overviews + AI Mode + ChatGPT cite domains that differ 80%+ for the same query (Jaccard 0.125-0.17). And they favor structurally different content profiles (German shop pages in AI Mode, English documentation in ChatGPT). Most companies are already structurally better positioned for one platform than the others — the question is which one, not “how do I rank on all three.” A B2B SaaS with English documentation is naturally ChatGPT-positioned; a German e-commerce site is naturally AI-Mode-positioned. Optimizing for the platform your content profile already fits beats trying to be optimal across all three.
3. Calibrate GEO expectations away from snapshots. Traditional SEO promises (“Rank 1 for keyword X”) do not transfer. With 56% weekly drift in AI Mode and 74% in ChatGPT, a single citation placement is not a reproducible result — it’s a snapshot. Success metrics need to be reframed around presence-over-time (e.g., ”% of weeks in which my domain is cited on this query over a 12-week window”) rather than single-shot placements. GEO is a continuous process, not a one-off optimization. Reporting tools that surface a single-week citation snapshot are surfacing noise.
How This Connects to the Existing Cluster
SISTRIX is the temporal companion piece to every other study in the AI SEO thesis cluster. It is the first study that adds the week-over-week persistence dimension to a literature otherwise dominated by single-snapshot or short-window observations.
-
AirOps Fan-Out Effect ChatGPT Study measured retrieval rank dominance (4.1× gap rank-1 vs rank-10) on ChatGPT at one point in time — 16,851 queries, single capture. SISTRIX shows that even within ChatGPT’s tight 3-4-domain citation set, 74% of those domains rotate weekly. The AirOps finding “retrieval rank dominates” combines with the SISTRIX finding “even rank-1 wins rotate” to suggest a two-layer system: retrieval rank determines candidacy for this week’s citation; weekly rotation determines whether last week’s winners are re-sampled or replaced. ^[inferred]
-
Ahrefs Schema → AI Citations Causal Study ran a 2-day Aug→Mar matched-DiD on 1,885 schema-added pages and found no causal lift. SISTRIX’s 74-85% weekly drift on ChatGPT provides the noise-floor context that makes the Ahrefs null result more interpretable: at week-over-week drift this high, a 2-day window is a snapshot in a fundamentally non-stationary system. A real causal effect on schema would need to demonstrate either (a) a sustained increase in core membership probability (not weekly carousel appearance) or (b) a measurable shift in the core-rate distribution. Neither is detectable in a 2-day window when the carousel turns over weekly. The studies do not contradict — they operate at incompatible temporal resolutions.
-
Digital Applied 1,000 AIO Citation Pattern Study found AI Overviews citations highly concentrated (top 1% of domains capture 47% of citations). SISTRIX confirms and explains the mechanism: AIO is the most stable platform (53% of prompts have zero source changes across 17 weeks). The concentration is a consequence of the closed-circle architecture — once a domain is in the citation set, it persists; once out, it stays out. Digital Applied measured the cross-sectional distribution; SISTRIX measured the temporal mechanism that produces it.
-
Zyppy AI Citation Ranking Factors Meta-Analysis lists 23 factors weighted by inferred influence. SISTRIX adds a meta-factor that none of the underlying 54 studies in Zyppy’s meta-analysis captured: “is this signal correlated with core membership or just with carousel appearance?” Most factors in Zyppy were measured against single-snapshot citation outcomes. SISTRIX implies those measurements conflate two fundamentally different success states, and the actual ranking weight on each factor likely differs substantially when computed against core vs carousel as the outcome variable.
-
Engine divergence — every prior study in the cluster has noted that ChatGPT, AIO, and AI Mode behave differently. SISTRIX quantifies it precisely with Jaccard indices (AIO ↔ AI Mode = 0.17; AI Mode ↔ ChatGPT = 0.125) and shows that the divergence is also temporal. The same prompt run on the same engine twice in two weeks shares only 26-44% of cited domains. The divergence is not merely cross-engine — it is also week-over-week within the same engine. This materially expands the “engine-specific strategy” requirement: the strategy must also be re-tested over time, not just across engines.
The SISTRIX study does not contradict any prior finding. It re-frames the existing literature around a missing dimension (temporal persistence) and shows that several findings previously considered to be in tension — schema correlation positive, schema causal null; high concentration on AIO, high churn on AI Mode — are reconcilable once the core/carousel decomposition is applied.
Open Questions
- What predicts core membership vs carousel rotation? The SISTRIX classification (language, evergreen, content type, E-E-A-T) gives correlated profiles but not a causal mechanism. The same publisher could have one page in the core and another in the carousel for related queries — what differentiates them at the page level? The study does not isolate this.
- Does the 17-week window understate eventual stabilization? SISTRIX explicitly notes “no clear trend over the 17-week period.” But 17 weeks is still a small window in the multi-year lifecycle of these platforms. A 52-week or 104-week follow-up would resolve whether drift is genuinely structural or merely “slow-stabilizing.” ^[inferred]
- Why does ChatGPT prefer English sources even for German queries (68%)? Crawl coverage explains volume, but not core preference. Hypothesis: ChatGPT’s retrieval system has stronger English-language ranking signals (training data is English-dominated, the underlying Bing index has stronger English signal quality) — so even when German sources are available, they don’t survive the LLM’s confidence ranking. ^[inferred] Not tested in the study.
- Is the YouTube 24% core rate finding causal or attribution? YouTube being heavily cited could be (a) genuinely the highest-quality answer source for many queries, (b) a platform-level preference baked into AI Mode that favors Google-owned video, or (c) an attribution artefact (YouTube transcripts being heavily indexed, surfacing as “the source” even when the original creator is the actual source). ^[inferred] The study does not separate these.
- Cross-engine 17-week consistency. Does a domain in the AI Mode core also tend to be in the AIO core for the same query, even with Jaccard 0.17 cross-sectional overlap? The longitudinal-vs-cross-sectional intersection isn’t computed. If yes, “AI authority” generalizes across Google products. If no, AIO and AI Mode require separately-tracked strategies even within Google.
Related
- AirOps Fan-Out Effect ChatGPT Study — Single-snapshot retrieval-rank study (16,851 queries). SISTRIX’s temporal companion: AirOps measured what determines weekly citation candidacy; SISTRIX measured how that candidacy rotates week-over-week.
- Digital Applied 1,000 AIO Citation Pattern Study — Cross-sectional AIO concentration finding (top 1% capture 47% of citations). SISTRIX explains the underlying mechanism: AIO is bimodal-stable (53% of prompts have zero source changes across 17 weeks).
- Ahrefs 38% AIO Top-10 Update — The companion finding that 38% of AIO top-10 sources change within 2 days. SISTRIX’s 17-week longitudinal frame contextualizes the Ahrefs short-window observation: drift is structural, not a transient 2-day phenomenon.
- SE Ranking 1.3M AI Mode Citations — Cross-sectional Google self-citation study on AI Mode. SISTRIX provides the temporal dimension that SE Ranking’s snapshot study lacks.
- Ahrefs AI Mode vs AI Overviews 730K — Cross-engine divergence study. SISTRIX confirms and quantifies the divergence with Jaccard indices and adds the temporal dimension (divergence is both cross-engine and week-over-week).
- Zyppy AI Citation Ranking Factors — 23-factor meta-analysis of 54 underlying studies. SISTRIX adds the meta-factor “is this signal predictive of core membership or only carousel appearance?” — a dimension the underlying studies could not measure.
- GEO-16 Framework — Academic cross-engine correlational study. SISTRIX provides the longitudinal evidence that the GEO-16 framework’s recommended factors should be evaluated for core-rate effect, not just one-shot citation effect.
Try It
- Re-frame your GEO reporting from snapshots to presence-over-time. Replace ”% of target queries where my domain is cited” with ”% of weeks over a 12-week window where my domain is cited on each target query.” Single-week measurements are noise; persistence is signal. If your current reporting tool can’t do this, log weekly snapshots yourself and compute the persistence metric quarterly.
- Decompose your citation tracking into core vs carousel layers. For each tracked query, count the domains that appear in ≥80% of weekly snapshots (core) vs the domains that appear in <50% (carousel). Track these separately. Optimization wins in the carousel layer are not durable; only core membership gains compound.
- Cut news-content investment for AI-citation goals. With a 1.4% core rate, news/editorial articles are practically excluded from durable AI citation. Reallocate that investment toward evergreen product/shop pages (high core rate) and YouTube video (24% core rate).
- Audit your platform-fit before splitting GEO investment. Pick one engine to optimize first based on your content profile. ChatGPT favors English-language, documentation/institutional, E-E-A-T 16/20. AI Mode favors local-language, evergreen, product/shop. AI Overviews is closed-circle (very hard to break into; very stable once in). Optimize for your natural fit before fighting the other two.
- Track competitor co-citation slots on your brand queries. Even if your brand domain is anchored, the 12-15 co-citing domains rotate at 70% weekly. Pull who appears alongside your brand each week — that turnover is the competitive intelligence signal. Competitors gaining persistence in your co-citation slot is the leading indicator of brand-query erosion before it shows up in actual brand-domain drift.
- Stop chasing URL-level citation; track at domain-level. With URL-level drift at 85% week-over-week, the specific cited subpage on your domain is not controllable. What is measurable and controllable is whether your domain appears at all. Use domain-level presence as the primary KPI; treat URL-level visibility as second-order.