Source: ai-research/ramp-marketing-to-ai-agents-2026-05-28.md — Grace Cummins (Ramp), Ramp Builders Blog, April 30, 2026.
First-party B2B controlled experiment from Ramp on marketing tracked incentives directly to AI agents. 5-week longitudinal data across three content-format variants (markdown / stripped HTML / schema), three target LLMs (Claude, ChatGPT, Perplexity), and ~50 pages of ramp.com. Hard numbers: 1,300+ bot visits → 370 agent relays → peak 33 citations/day. Strongest single signal in the public AI-citation literature to date. Slots into the AI SEO research cluster as the first first-party B2B experimental study to clear the controlled-conditions bar.
Key Takeaways
- Markdown beat schema and stripped HTML — by a wide margin. Going in, the team assumed schema markup (literally designed for machines) would win. The opposite happened. Markdown was the only format that reliably surfaced in LLM responses. LLMs are trained on markdown-heavy datasets; they parse it natively. If you’re optimizing content for AI agents, start with markdown. Reframes the FLUQs framework’s “formats” axis: markdown is the load-bearing format, not schema.
- Per-model behavior is wildly different. One-size-fits-all does not work.
- Claude (Anthropic) — Day 12 first surface, then went all-in: named the program, stated the exact incentive amount, linked the tracked booking URL, gave step-by-step claim instructions. ~6 matches/day, then jumped 4× around week 3, then climbed again. Dominant channel by end of test.
- Perplexity — Day 2 first surface, but vague forever (“some channels offer bonuses”). Day 33 was the first time it surfaced the specific branded phrase by name. Conservative; fast but generic.
- ChatGPT — Zero matches for 32 straight days. The bot was reading the content. The model just never surfaced it. Complete silence.
- Anthropic’s crawler is more aggressive than all other named AI crawlers combined on Ramp’s content — by a wide enough margin Ramp double-checked the logs. Independent corroboration: Anthropic’s own Measuring AI Agent Autonomy in Practice (Feb 18, 2026) reports Claude Code session length and auto-approve rate both rising, consistent with high crawler volume reflecting high user activity. Two independent measurements, two angles, same finding.
- There is an “agent trust” signal analogous to domain authority — but the signals are different from SEO. Pages with higher existing AI citation volumes were far more likely to surface embedded content. Pages with low existing AI citation volume got zero incentive mentions regardless of format. Pages LLMs cite most ≠ pages SEO-optimized for organic search. A top-performer for incentive surfacing was a page Ramp had never prioritized for organic.
- Cloudflare’s bot category labels are misleading. ChatGPT, Perplexity, and Claude bots are classified as “AI Assistants,” not “AI Search.” If you build rules targeting “AI Search” bots, you’ll miss the three biggest LLM platforms entirely. One Ramp variant missed all major bots for several days due to this.
- DeepSeek spoofs its UA as Chrome 58 (2017 browser). Reliable detection requires TLS fingerprint plus ASN, not user-agent string alone.
- OpenAI SearchBot caches aggressively — crawl volume rises with ChatGPT user traffic, but falls off over time even as usage grows. Heavy caching periods that get refreshed periodically.
- Bot population is growing fast and getting more diverse. Between weeks 2-5: Claude-Code traffic +180%, DeepSeek +845%. New bots appeared mid-experiment: Google’s NotebookLM, Meta’s ExternalFetcher, you.com’s YouBot. “If you’re checking your bot logs once a quarter, you’re already out of date.”
- The recursive observation. Publishing this post about agent incentives may end up being more effective than running the experiment, because people will go ask LLMs “does Ramp have an AI-exclusive offer?” which increases Ramp’s presence in LLM responses, which drives more agent-mediated discovery. The post itself becomes a channel.
Experimental design
| Variant | Format served to bots | Bot targeting |
|---|---|---|
| A | Pure markdown — full page content converted to .md | Broad: Cloudflare “AI Assistant” category OR unverified bots with low bot score |
| B | Stripped HTML — clean semantic HTML, navigation and chrome removed | Strict: verified “AI Search” or “AI Assistant” bots only |
| C | Schema / structured data — machine-readable markup injected into existing pages | Same as B |
- Cloudflare Workers conditionally served alternate content when a bot was detected; humans saw the standard pages.
- Each variant embedded unique tracking so any downstream action could be attributed back to the format and the page that produced it.
- ~50 pages across ramp.com participated in the test.
- Test groups balanced by existing AI citation volume.
- Detection ran daily across Claude, ChatGPT, Perplexity, scanning every response to “compare top spend management platforms”-style prospect queries for offer mentions.
Hard numbers
| Window | Bot visits | Agent relays | Peak citations / day |
|---|---|---|---|
| 2.5 weeks | 1,300+ | ~40 | — |
| 5 weeks | (continued) | 370 | 33 |
| 32 days, ChatGPT only | — | 0 | 0 |
Step-change at ~3 weeks: Claude’s daily match rate jumped 4× overnight with no observable external cause (no model release, no content change). Then climbed again in the last 5 days of the test window.
Why this matters
Gartner estimates $15 trillion in B2B purchases will be agent-mediated by 2028. When a CFO asks ChatGPT to “compare top spend management platforms,” the agent does not browse a homepage. It queries, ingests structured data, and evaluates what it can parse — not what looks good.
Every B2B company’s website was built for humans. Marketing to agents, with above-board referral offers (not jailbreaks or prompt injection), is the load-bearing question for the next 6 months of work. This is the first published B2B controlled experiment to provide hard data on whether it works.
Adjacent prior work cited by the post:
- Vercel’s proposal for inline LLM instructions in HTML
- Cloudflare’s method for serving markdown directly to agents
- Addy Osmani’s recommendations for agent engine optimization
Implementation
Tool/Service: Cloudflare Workers + custom logging pipeline + daily LLM query script Setup:
- Cloudflare Worker in front of one high-intent page, classifying
human/known_ai_bot/ambiguousby UA + IP + ASN + bot score (optional). - Three content variants: markdown, stripped HTML, schema-heavy. Identical offer text across all three.
- Per-variant tracked conversion URL so you can attribute back to format.
- Raw access logs: UA, IP, ASN, and bot score logged as separate fields, retained 30+ days.
- A weekly query run across Claude, ChatGPT, Perplexity, and Gemini with your top prospect questions, and a diff of the answers.
Cost: Cloudflare Workers free tier covers low-volume deployments; daily query script costs cents/day across the four model APIs.
Integration notes: Cloudflare bot category labels (AI Search vs AI Assistant) require explicit handling — AI Search rules miss the three biggest LLM platforms. DeepSeek detection requires TLS fingerprint + ASN, not UA. OpenAI SearchBot caches aggressively, so traffic volume understates discovery activity.
How it sits in the AI-citation thesis cluster
The AI SEO research cluster currently spans 14 studies grouped by evidence type. The Ramp experiment lands at the causal-evidence end of the spectrum and adds three things no prior study supplied:
- First-party B2B controlled experiment — not third-party / observational / case study. Industry-leader engineering team running the test on their own commercial site.
- Format-level causal evidence — markdown vs. schema vs. stripped HTML as A/B/C variants. Most prior studies measure what already gets cited (correlational). Ramp tests what causes citation when held otherwise constant.
- Per-model differential behavior measured against one constant content payload. Most prior studies aggregate across models or measure one model at a time. Ramp shows that the same page can get cited by Claude and ignored by ChatGPT, which means single-model AI-visibility audits miss the structural variance.
Cross-validates FLUQs: of the four FLUQs levers (Format, Lexicon, Update-cadence, Quality), Ramp’s data backs Format as load-bearing (markdown wins by a wide margin). Lexicon hint emerges in week 5 (“Ramp AI Exclusive Offer” as a brand-anchored phrase Perplexity finally surfaces).
Cross-validates Anthropic’s measuring-agent-autonomy: Anthropic crawler aggressiveness is corroborated independently from Anthropic’s own usage-side measurements.
Related
- SEO & Content topic
- AI SEO research hub — methodology-organized cluster
- FLUQs framework — Format/Lexicon/Update-cadence/Quality levers Ramp’s data informs
- Similarweb most-cited-domains study — observational peer to Ramp’s experimental design
- Google’s Official Generative AI Search Optimization Guide — adjacent Google guidance
- Measuring AI Agent Autonomy in Practice (Anthropic) — independent corroboration of Anthropic crawler aggressiveness
- How We Contain Claude (Anthropic) — same engineering-blog series cluster from Anthropic
- The Agent-Readable Web — connection article: Ramp is the demand side alongside WebMCP (interface) and EmDash (supply)
Try It
- Pull your last 30 days of bot-classified traffic from your CDN first. Ramp’s strongest pre-experiment warning: most marketers do not know what is already showing up in their logs.
- Verify Cloudflare bot category labels. If you target “AI Search,” confirm you are not silently missing Claude, ChatGPT, and Perplexity (they’re under “AI Assistant”).
- Pick ONE high-intent page for the minimum viable replication. Don’t start with 50 pages.
- Serve markdown as your first variant. Schema markup is not the winner; Ramp’s data is unambiguous on this.
- Set up daily LLM query monitoring across Claude + ChatGPT + Perplexity + Gemini. A single weekly diff of responses to your top 10 prospect questions is the minimum signal you need.
- Plan for 3+ weeks of zero before declaring success. Claude’s first relay arrived at Day 12, the step-change at ~3 weeks. ChatGPT may stay silent indefinitely; that is a finding, not a failure.
- Attribute trackable actions, not link clicks. Agents may not click; design the conversion path to require a trackable action.
Open Questions
- Mechanism behind the Day-21 step-change. Claude’s 4× jump came with no observable external cause. Was it model-side reweighting on Anthropic’s training pipeline? Crawler revisit cycle? Internal index refresh? Cannot be answered from Ramp’s side alone.
- Will the silence from ChatGPT generalize? 32 days of zero relays could reflect OpenAI policy (don’t surface unverified offers in commercial contexts), training-set bias (markdown not surfaced as commercial), or just a longer trust window than Claude’s. No data yet to distinguish.
- Does the markdown-wins finding hold for non-B2B content? Ramp tested B2B finance positioning. Consumer-content audits (e.g., DTC e-commerce) may surface different format preferences.
- What’s the floor of “agent trust”? Ramp confirms the ceiling matters but does not quantify when a page is below the trust floor. Open question for follow-up methodology work.