Source: raw/ai_index_report_2026.pdf
Publisher: Stanford Institute for Human-Centered Artificial Intelligence (HAI)
Citation: Sha Sajadieh, Loredana Fattorini, Raymond Perrault, Yolanda Gil et al. “The AI Index 2026 Annual Report,” AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2026.
License: CC BY-ND 4.0 International
Length: 423 pages, 9 chapters
Co-chairs: Yolanda Gil (USC/ISI) · Raymond Perrault (SRI)
Lead editor: Sha Sajadieh (Stanford)
Supporting partners: Google, NSF, OpenAI, Open Philanthropy, Quid, Infosys
Data partners: Digital Policy Alert, Center for Research on Foundation Models, CSET, Epoch AI, GitHub, IFR (International Federation of Robotics), Kapor Foundation, Lightcast, LinkedIn, McKinsey, NSF ECEP, Schmidt Sciences, RAISE Health, zeki
The ninth edition of the Stanford HAI AI Index. First edition to include standalone chapters on AI in Science and AI in Medicine. Adds AI sovereignty as an analytical framework. The co-chairs’ framing: “The data does not point in a single direction. It reveals a field that is scaling faster than the systems around it can adapt.” Authoritative independent measurement — the primary artifact for sanity-checking vendor claims and framing strategy conversations.
The Core Finding
Generative AI hit nearly 53% population-level adoption within three years — faster than the personal computer or the internet. Organizational adoption is 88%. 4 in 5 university students now use generative AI. The US-China model performance gap has effectively closed. Safety benchmarks are lagging capability. Talent flow into the US has dropped 89% since 2017.
The 15 Top Takeaways (Verbatim from Report)
-
AI capability is not plateauing. Industry produced over 90% of notable frontier models in 2025, and several meet or exceed PhD-level baselines in PhD-level science, multimodal reasoning, and competition mathematics. SWE-bench Verified rose from 60% to near 100% in a single year. Organizational adoption reached 88%, and 4 in 5 university students use generative AI.
-
The US-China AI model performance gap has effectively closed. US and Chinese models have traded the lead multiple times since early 2025. In February 2025, DeepSeek-R1 briefly matched the top US model; as of March 2026, Anthropic’s top model leads by just 2.7%. The US still produces more top-tier models and higher-impact patents; China leads in publication volume, citations, patent output, and industrial robot installations. South Korea leads in AI patents per capita.
-
The US hosts the most AI data centers, with chips fabricated by one Taiwanese foundry. US hosts 5,427 data centers — more than 10× any other country, consuming more energy than any other. A single company, TSMC, fabricates almost every leading AI chip, making the global hardware supply chain dependent on one foundry in Taiwan (TSMC-US expansion began operations in 2025).
-
AI models can win gold at the International Mathematical Olympiad but cannot reliably tell time. Gemini Deep Think earned a gold medal at IMO, yet the top model reads analog clocks correctly just 50.1% of the time. AI agents jumped from 12% to ~66% task success on OSWorld but still fail roughly 1 in 3 attempts on structured benchmarks. This is the “jagged frontier”.
-
Robots still fail at most household tasks. Robots succeed in only 12% of household tasks. On RLBench, simulation manipulation hits 89.4% success — the sim-to-real gap is wide.
-
Responsible AI is not keeping pace with capability; safety benchmarks are lagging; AI incidents rising sharply. Documented AI incidents rose to 362 in 2025, up from 233 in 2024. Frontier labs report capability benchmarks consistently; responsible-AI reporting is spotty. Improving one dimension (e.g. safety) can degrade another (e.g. accuracy).
-
The US leads AI investment but is losing ability to attract global talent. US private AI investment reached **12.4B (headline figure; China’s government guidance funds likely understate total). US led in entrepreneurial activity with 1,953 newly funded AI companies (10× the next country). But AI researchers/developers moving to the US dropped 89% since 2017 — 80% drop in the last year alone.
-
AI adoption is spreading at historic speed; consumers deriving substantial value from free tools. GenAI hit 53% population adoption in three years, faster than PC or internet. Adoption varies by country: Singapore 61%, UAE 54%, US 28.3% (ranked 24th). Estimated consumer value of GenAI tools to US consumers reached $172B annually by early 2026, with median per-user value tripling between 2025 and 2026.
-
Productivity gains from AI appear in the same fields where entry-level employment is declining. Studies show 14-26% productivity gains in customer support and software development; weaker/negative effects in high-judgment tasks. AI agent deployment remains single-digit across most business functions. US software developers ages 22-25 saw employment fall nearly 20% from 2024 even as older developer headcount grew.
-
AI’s environmental footprint is expanding. Grok 4 training emissions reached 72,816 tons CO₂-equivalent. AI data center power capacity rose to 29.6 GW at peak demand — comparable to New York State. Annual GPT-4o inference water use alone may exceed drinking water needs of 12 million people.
-
AI models for science can outperform human scientists, but bigger ≠ better. Frontier models outperform chemists on average on ChemBench but score <20% on astrophysics replication and 33% on Earth observation. A 111M-parameter protein model (MSAPairformer) beat previous leading methods on ProteinGym; a 200M genomics model (GPN-Star) outperformed a model nearly 200× larger. Most AI-for-science foundation models come from cross-sector collaborations, not industry alone.
-
AI is transforming clinical care, but rigorous evidence remains limited. AI clinical-note-generation tools saw substantial 2025 adoption. Physicians reported up to 83% less time writing notes and significant burnout reductions. But a review of 500+ clinical AI studies found nearly half relied on exam-style questions rather than real patient data; only 5% used real clinical data.
-
Formal education is lagging but people are learning AI skills at every stage. Over 80% of US high school and college students use AI for school. Only half of middle/high schools have AI policies; just 6% of teachers say those policies are clear. AI engineering skills are accelerating fastest in UAE, Chile, South Africa. New AI PhDs in US+Canada rose 22% (2022→2024), but those PhDs are taking academia jobs, not industry.
-
AI sovereignty is becoming a defining feature of national policy. National AI strategies are expanding, particularly among developing economies; state-backed supercomputing investments rising in parallel. Model production remains concentrated in US+China, but open-source development is redistributing participation — the rest of the world now outpaces Europe and approaches the US on GitHub.
-
AI experts and the public have very different perspectives; global trust is fragmented. 73% of experts expect positive impact on how people do their jobs vs. only 23% of the public — a 50-point gap. US has the lowest trust in its own government to regulate AI, at 31%. Globally, the EU is trusted more than US or China to regulate AI effectively.
Chapter Map (423 pages total)
| Chapter | Page | Focus |
|---|---|---|
| 1. Research and Development | 12 | Notable models, compute, data centers, energy, open source, publications, patents, talent |
| 2. Technical Performance | 68 | Benchmark progress, agents, jagged-frontier examples |
| 3. Responsible AI | 126 | Safety benchmarks, incidents, disclosure gaps, tradeoffs |
| 4. Economy | 171 | Investment, adoption, productivity, labor markets, consumer value |
| 5. Science | 231 | AI for science, foundation models, cross-sector collab |
| 6. Medicine | 255 | Clinical AI adoption, evidence gaps, ambient scribes |
| 7. Education | 288 | Student/teacher use, AI policies, skill diffusion |
| 8. Policy and Governance | 323 | EU AI Act, US deregulation, national strategies, AI sovereignty |
| 9. Public Opinion | 360 | Trust gaps, expert/public divergence, global sentiment |
Chapter 1 Highlights (R&D)
- Industry produced >90% of notable models in 2025; frontier models are less transparent — training code, dataset sizes, parameter counts withheld by OpenAI, Anthropic, Google.
- China: 50 notable models in 2025 (vs US 30 in the organization view — but US retains 50 notable overall, China 30). US retains higher-impact patents; China has grown top-100 cited papers from 33 (2021) to 41 (2024).
- Parameter counts stayed near 1 trillion for three years; frontier labs stopped reporting. Training compute (independently estimable) continued rising.
- Synthetic data not replacing real pre-training data yet, but post-training + data-quality techniques showing promise (OLMo 3.1 Think 32B with 90× fewer params than Grok 4 achieves comparable results on some benchmarks via pruning/dedup/curation).
- Global AI compute: 3.3× per year since 2022, reaching 17.1M H100-equivalents. Nvidia ≥60%; Google+Amazon supply most of the rest; Huawei small but growing.
- Energy: Grok 4 training ≈72,816 tons CO₂-eq. AI data center power = 29.6 GW (NY State at peak).
- Open source: 5.6M projects on GitHub; Hugging Face uploads tripled since 2023. US projects still dominate engagement (30M cumulative stars across 10+ star projects).
- Talent migration to the US dropped 89% since 2017, 80% in last year alone. Switzerland + Singapore lead per-capita researcher density.
- Gender gap: deeply entrenched; no country approaches parity. Saudi Arabia (32.3%), Canada (29.6%), Australia (30.1%) lead.
Key Takeaways (Distilled for Practical Use)
- Cite-worthy adoption stats: 53% pop / 88% org / 80% student — use these when framing “AI is mainstream” slides.
- Jagged-frontier examples are the best way to communicate “AI is powerful but unreliable” to non-technical stakeholders. IMO gold medal + 50.1% analog clock reading is the canonical pairing.
- US-China gap closed is a strategic signal — if you’re sourcing models, open-weight Chinese models (DeepSeek, Qwen, Z.ai) are competitive with frontier.
- Consumer value of free GenAI: $172B/yr US, tripling year-over-year — frames willingness-to-pay analysis for any SaaS wedge.
- Productivity gains 14-26% in support + dev — this is the defensible stat for marketing-automation ROI decks.
- 22-25 year-old dev employment -20% — the “entry-level is collapsing first” data point.
- AI incidents +55% YoY (233→362) — the specific number to cite when making the case for governance/safety investment.
- Safety benchmarks are saturating or missing — frontier labs disclose capability, not responsible-AI, consistently.
Try It
- For strategy decks: pull the chart images from the HAI public data ([Google Drive link in report]) and cite Figure numbers directly. License is CC BY-ND — attribution required, no derivatives of the charts themselves.
- For policy work: the HAI report adds national/regulatory context (EU AI Act, US deregulation, sovereignty framing) to any private-sector governance frame you already use.
- For sales conversations: the “expert 73% vs public 23%” gap is the single most useful data point for framing objections about AI-assisted work.
- For the Karpathy wiki itself: this is the baseline for annual “state of the field” file-back articles — when HAI 2027 lands, we compare.
Implementation
Tool/Service: AI Index 2026 public data (Google Drive) + interactive Global AI Vibrancy tool (36 countries, updating end of 2026).
Setup: Report is freely downloadable from aiindex.stanford.edu.
Cost: Free; CC BY-ND 4.0.
Integration notes:
- Epoch AI is the data partner for compute + notable models — epochai.org is the live-updating source; AI Index is a February snapshot (data cutoff 2026-02-12).
- LinkedIn (via Lightcast) is the labor-market data source — 22-25 yr developer employment decline is from that.
- McKinsey & Company is the org-adoption + productivity source.
Open Questions
This article is a top-takeaways + chapter-map summary, not a full ingestion of the 423-page report. Deep-dive articles worth filing back later as separate wiki entries:
- Chapter 2 (Technical Performance) — agent benchmarks, OSWorld, jagged-frontier dossier.
- Chapter 3 (Responsible AI) — documented incidents catalog, safety-vs-accuracy tradeoffs.
- Chapter 4 (Economy) — detailed productivity study list and methodology.
- Chapter 8 (Policy and Governance) — EU AI Act first prohibitions, US deregulation shift, national strategy diff.
- Other open questions:
- HAI doesn’t publish a methodology for “notable AI models” — relies on Epoch AI curation. Bias risk: what does Epoch miss from non-English sources?^[inferred]
- Consumer-value figure ($172B/yr) methodology not visible in top-takeaways section; requires reading Chapter 4 to assess.
- “AI experts 73% vs public 23%” gap — who exactly are “experts” in the survey sample? Not defined in takeaways.^[ambiguous]
- The “88% org adoption” figure is McKinsey-sourced and includes very light use (one AI tool anywhere in the org). Not a fair proxy for agentic deployment, which the report notes remains single-digit across most functions.
Related
- AI Agents Unleashed — 2026 Playbook (Mindstream × Futurepedia) — complementary: HAI gives the macro data, Mindstream gives the implementation playbook
- Claude Agent Hierarchy
- Claude AI — vendor-specific landing
- AI Marketing Automation Use Cases
- WalkMe State of Digital Adoption 2026