GEO-16 Framework — Academic AI Citation Behavior Study (arXiv 2509.10762v1, Kumar & Palkhouski, 2025-09)

Source: geo-16-framework-arxiv-kumar-palkhouski-2025-09.md — Arlen Kumar (UC Berkeley) and Leanid Palkhouski (Wrodium Research). arXiv:2509.10762v1, submitted September 13, 2025.

The first academic paper to publish a structured AEO/GEO auditing framework derived from a multi-engine empirical study. Kumar and Palkhouski audited 1,100 URLs against 70 B2B SaaS prompts and harvested 1,702 citations across Brave Summary, Google AI Overviews, and Perplexity sonar-pro. They built GEO-16 — a 16-pillar scoring framework — from the dataset. Headline: pages scoring ≥0.70 on GEO with ≥12 pillar hits achieve 78% cross-engine citation rates. The top three correlated pillars are Metadata & Freshness (r=0.68), Semantic HTML (r=0.65), and Structured Data (r=0.63), all p<0.001. The paper is the strongest academic-rigor source on AI citation correlations as of this writing — and explicitly self-flags its observational design as non-causal.

Key Takeaways

First academic AEO/GEO citation study. Authors are affiliated with UC Berkeley and Wrodium Research; submitted to arXiv September 2025 (preprint).
Multi-engine dataset. 1,100 URLs / 1,702 citations / 70 prompts / 16 B2B SaaS verticals. Citation distribution: Brave 36.0%, Google AIO 35.1%, Perplexity 28.9%.
GEO-16 = 16 pillars across 6 principles. People-first content (3 pillars), structured data (3), provenance (3), freshness (2), risk management (2), RAG optimisation (3). Each pillar scored 0-3; aggregate G(u) = (1/48)Σ ∈ [0,1].
Threshold finding. Pages with GEO ≥0.70 AND ≥12 pillar hits → 78% cross-engine citation rate. This is the headline operational threshold.
Top three correlated pillars (r values, all p<0.001):
- Metadata & Freshness: r=0.68, 95% CI [0.64, 0.72], +47% citation impact
- Semantic HTML: r=0.65, 95% CI [0.61, 0.69], +42%
- Structured Data: r=0.63, 95% CI [0.59, 0.67], +39%
Mid-tier correlations: Evidence & Citations (r=0.61, +37%), Authority & Trust (r=0.59, +35%), Internal Linking (r=0.57, +33%).
Observational, NOT causal. Authors explicitly write: “Our observational design may suffer from unobserved confounding (internal validity)” and “we do not experimentally vary publication venues, so causal effects of off-page authority remain unverified.” This is critical context for interpreting the correlation values.
Pairs with the Ahrefs causal contradiction on Structured Data. GEO-16’s r=0.63 / +39% is correlational; Ahrefs’s matched DiD found no causal schema lift. Same reconciliation as the AirOps study: schema is a marker, not a lever.
B2B SaaS scope. All 16 verticals are SaaS-adjacent. Authors flag external validity: results may not generalize to consumer / healthcare / news / non-English content.

Structured Data: GEO-16 says r=0.63 / +39%, Ahrefs says no causal lift

GEO-16 says (this paper, cross-sectional observational, Brave/AIO/Perplexity, B2B SaaS) — Structured Data r=0.63, 95% CI [0.59, 0.67], +39% citation impact. Ahrefs says (ChatGPT) — adding schema mid-period produces no statistically meaningful citation lift. Reconciliation: GEO-16’s authors explicitly self-flag the observational/non-causal design. The r=0.63 captures correlation between “pages that have schema” and “pages that get cited” — but pages with schema are not a random subset of pages; they’re systematically more mature on editorial and technical dimensions. Ahrefs’s matched DiD isolates the intervention (adding schema) from the publisher characteristics; that isolation removes the lift. The framework’s overall threshold finding (GEO ≥0.70 + ≥12 pillar hits → 78% citation) is still actionable as a what predicts citation model, but readers should not interpret “+39% citation impact” as “add schema and citations rise 39%.” Status: resolved (2026-05-19) — methodological-difference, not factual.

The 16 Pillars (Grouped by Principle)

Principle	Pillars	Count
People-first content	UX & Readability; Claims & Accuracy; Microcontent	3
Structured data	Semantic HTML; Structured Data; Metadata & Freshness	3
Provenance	Authority & Trust; Evidence & Citations; Transparency & Ethics	3
Freshness	Metadata & Freshness; Content Depth	2
Risk management	Claims & Accuracy; Transparency & Ethics	2
RAG optimisation	Internal Linking; External Linking; Engagement & Interaction	3

Pillar scoring: Each pillar receives a band score b_j(u) ∈ {0,1,2,3}. A “pillar hit” occurs when b_j(u) ≥ 2.

Aggregate GEO score: G(u) = (1/48)Σ b_j(u) ∈ [0,1]. The denominator (48) = 16 pillars × max band 3.

Individual sub-signal weights (w_j,i) within pillars are not disclosed in v1.

The Six Principles — Verbatim Author Guidance

People-first content

“Lead with an answer-first summary (TL;DR or key takeaways), keep paragraphs compact, use descriptive headings/lists, and mark claims versus opinions explicitly.”

Structured data

“Maintain a single <h1> and logical <h2>/<h3> hierarchy; provide valid JSON-LD (Article/TechArticle/FAQPage) with datePublished, dateModified, author, and breadcrumb where relevant; expose canonical URLs and social cards. Ensure schema matches visible content.”

Provenance

“Cite primary sources inline, include a reference section, favour authoritative domains (.gov/.edu/standards bodies), and perform link-health checks to avoid rot/redirect loops.”

Freshness, Risk Management, RAG Optimisation

Per-pillar guidance follows the same pattern. The full pillar-by-pillar guidance is in the source PDF.

Reported Correlations

Pillar	Correlation (r)	p-value	95% CI	Citation Impact
Metadata & Freshness	0.68	<0.001	[0.64, 0.72]	+47%
Semantic HTML	0.65	<0.001	[0.61, 0.69]	+42%
Structured Data	0.63	<0.001	[0.59, 0.67]	+39%
Evidence & Citations	0.61	<0.001	[0.57, 0.65]	+37%
Authority & Trust	0.59	<0.001	[0.55, 0.63]	+35%
Internal Linking	0.57	<0.001	[0.53, 0.61]	+33%

Correlations for the other 10 pillars are not provided in v1.

Self-Flagged Limitations (Verbatim)

Internal validity: “Our observational design may suffer from unobserved confounding.”
Construct validity: GEO-16 captures only a subset of on-page quality signals.
External validity: Dataset limited to English-language B2B SaaS pages from a single time point; results may not generalize to other languages, verticals, or future engine versions.
Experimental limitation: “We do not experimentally vary publication venues, so causal effects of off-page authority remain unverified.”
Confounding: Engine-specific personalization and A/B variation not fully accounted for.

Practical Use

GEO-16 is the best-formalized audit framework in this thesis cluster. Treat it as a score-this-page-out-of-1.0 rubric rather than a causal recipe:

Threshold to chase: G(u) ≥ 0.70 + ≥12 pillar hits → 78% probability cross-engine citation.
Pillars to prioritize (highest measured correlations): Metadata & Freshness, Semantic HTML, Structured Data, Evidence & Citations.
Pillars that the framework formalizes but doesn’t yet measure correlations for: UX & Readability, Microcontent, Content Depth, Transparency & Ethics, External Linking, Engagement & Interaction, Claims & Accuracy.

Open Questions

Individual sub-signal weights (w_j,i). v1 doesn’t disclose how each sub-signal within a pillar is weighted. Update needed when v2 or final publication ships.
Correlations for the other 10 pillars. Only six pillars have published correlation values. The other ten are formalized in the framework but not yet measured against citation.
External validity outside B2B SaaS. Authors flag this. A consumer-vertical or healthcare replication would be the highest-value follow-up.
Versioning. v1 was submitted September 2025. A v2 or peer-reviewed final version may shift weights as engines change.
Engine-specific weights. All six published correlations aggregate across Brave + AIO + Perplexity. Per-engine weights would resolve some of the contradictions visible across AirOps’s ChatGPT-only finding and Digital Applied’s AIO-only finding.

Ahrefs Schema → AI Citations Causal Study — Causal counterpoint to GEO-16’s correlational findings on Structured Data. Both authors are upfront about the methodological boundary.
AirOps + Kevin Indig Fan-Out Effect ChatGPT Study — Companion observational study. AirOps’s stratified +6.5pp schema finding maps directly onto GEO-16’s r=0.63 Structured Data correlation from a different dataset.
Digital Applied 1,000 AIO Citation Pattern Study — AIO-only observational study with regression-style DA control. Schema 2.3× finding consistent with GEO-16’s +39% impact.
Zyppy AI Citation Ranking Factors Meta-Analysis — Cyrus Shepard’s 54-study aggregation. GEO-16 is likely one of those 54 underlying studies. ^[inferred]
Google’s Generative AI Search Optimization Guide — Google’s official position. Mostly aligned with GEO-16 on structured-data + semantic-HTML guidance; the divergence is causal-vs-correlational framing.
FLUQs Framework — Practitioner framework. GEO-16’s Microcontent + Evidence-and-Citations pillars map onto FLUQs’s EchoBlocks-as-causal-triplets pattern.

Try It

Score a sample of your top pages against the 16 pillars. Use the v1 paper’s band-score rubric (0-3 per pillar). Compute G(u) and count pillar hits.
Target the threshold: GEO ≥0.70 + ≥12 pillar hits. Below that, the citation rate drops sharply per the paper.
Sequence improvements by correlation strength: start with Metadata & Freshness (r=0.68), then Semantic HTML (r=0.65), then Structured Data (r=0.63). These are the three pillars with the strongest published evidence in the dataset.
Don’t over-index on Structured Data alone. Per the contradiction with Ahrefs’s causal study, schema is best framed as a marker of editorial maturity. Get the underlying editorial and metadata practices right; let schema follow.
Re-audit quarterly. Engines change. GEO-16’s authors flag that “results may not generalize to future engine versions.” Treat the framework as a snapshot, not a permanent ranking.

Jonathon's AI Wiki

Explorer

GEO-16 Framework — Academic AI Citation Behavior Study (arXiv 2509.10762v1, Kumar & Palkhouski, 2025-09)

Key Takeaways

The 16 Pillars (Grouped by Principle)

The Six Principles — Verbatim Author Guidance

People-first content

Structured data

Provenance

Freshness, Risk Management, RAG Optimisation

Reported Correlations

Self-Flagged Limitations (Verbatim)

Practical Use

Open Questions

Try It

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

GEO-16 Framework — Academic AI Citation Behavior Study (arXiv 2509.10762v1, Kumar & Palkhouski, 2025-09)

Key Takeaways

The 16 Pillars (Grouped by Principle)

The Six Principles — Verbatim Author Guidance

People-first content

Structured data

Provenance

Freshness, Risk Management, RAG Optimisation

Reported Correlations

Self-Flagged Limitations (Verbatim)

Practical Use

Open Questions

Related

Try It

Graph View

Table of Contents

Backlinks