Source: X Bookmark 1959298080156598637 (God of Prompt tweet pointing to the cookbook) → Openai Gpt 5 Prompting Guide 2026 05 02 (full guide content extracted from cookbook.openai.com)
This wiki is Claude-first, but several practical prompt patterns in OpenAI’s official GPT-5 Prompting Guide transfer directly to Anthropic models — particularly the eagerness-control patterns, tool preamble discipline, Cursor’s prompt-tuning case study, and the instruction-conflict warning. This article captures the cross-vendor takeaways without re-summarizing OpenAI-specific API parameter names. For the canonical version, link out to https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide.
Key Takeaways
- Eagerness is steerable. GPT-5 (like Claude Opus 4.7) operates anywhere from “delegate most decisions” to “tight programmatic leash.” Patterns:
- Less eagerness: lower
reasoning_effort, define explicit early-stop criteria (“top hits converge ~70% on one path”), set fixed tool-call budgets, give an escape hatch (“even if it might not be fully correct”). - More eagerness: higher
reasoning_effort, agentic-persistence prompts (“keep going until the user’s query is completely resolved”, “never stop or hand back when you encounter uncertainty”, “decide what the most reasonable assumption is, proceed, document”).
- Less eagerness: lower
- Tool preambles are universal. Pattern: rephrase the user’s goal → outline a structured plan → narrate each step → summarize completed work distinct from the upfront plan. Improves long-rollout UX in any reasoning-model interface.
- GPT-5’s
reasoning_effortandverbosityparameters separate thinking from output length. Claude has the analogous split via Extended Thinking effort tiers and natural-language verbosity overrides — cross-reference Opus 4.7 best practices for the equivalent on the Anthropic side. - Contradictory prompts hurt GPT-5 more than they hurt non-reasoning models. GPT-5 expends reasoning tokens trying to reconcile conflicting instructions rather than picking one at random. Concrete example given: a healthcare prompt with “never schedule without explicit patient consent” alongside “auto-assign earliest same-day slot without contacting the patient.” Rule of thumb: audit every multi-stakeholder prompt for internal contradictions; the older the prompt, the higher the odds. This is the same warning the Claude troubleshooting reference gives for Opus 4.7.
- Cursor’s prompt-tuning case study is portable.
- Verbosity split: set global
verbosity: lowfor chat-friendly status updates and prompt-explicitUse high verbosity for writing code and code toolsso the diffs stay readable. Single-letter variable names disappear. - Don’t defer; act: make explicit that “code edits will be displayed as proposed changes” so the model knows it can be proactive — “almost never ask the user whether to proceed with a plan; instead proactively attempt the plan and ask if they want to accept the implemented changes.”
- Lighten thoroughness language for newer models: old “Be THOROUGH… maximize…” prompts that worked on GPT-4-class models over-call search on GPT-5. Soften the language; remove the
maximize_prefix. Same lesson likely applies to Opus 4.7 — it’s already proactive at gathering context, and aggressive thoroughness exhortations push it past the useful point. - Structured XML specs (
<[instruction]_spec>) improve instruction adherence and let you reference categories/sections elsewhere in the prompt cleanly.
- Verbosity split: set global
- Self-rubric prompting (zero-to-one app generation): “First, spend time thinking of a rubric until you are confident. Create 5–7 categories — do not show the user. Use the rubric internally to think and iterate; if not hitting top marks across all categories, start again.” Generic technique — works on Opus 4.7 too for high-stakes generation.
- Codebase-rules block. Without prompting, GPT-5 already searches reference context (reads
package.json). Behavior is sharpened by an explicit rules block summarizing engineering principles, directory structure, and design taste. Same pattern aligns with Simon Scrapes’ static-context split and the standardCLAUDE.mdfile. - Markdown formatting is opt-in. GPT-5 in the API does not format final answers in Markdown by default (max compatibility). Prompt explicitly when you want it. For long conversations, append a Markdown reminder every 3–5 user messages — adherence degrades over rollouts.
- Metaprompting works. Use the model itself to optimize prompts — feed it the prompt + the desired/undesired behavior gap and ask for minimal edits. Generic across reasoning models.
- Responses API gives measurable agentic gains on the OpenAI side (Tau-Bench Retail 73.9% → 78.2% just by switching to Responses API +
previous_response_id). On the Anthropic side, the equivalent is Extended Thinking with reasoning persistence and prompt caching.
OpenAI-only details (skip for Claude work)
These are GPT-5–specific and don’t transfer:
apply_patchis the canonical file-edit tool format that GPT-5 is trained against.minimalreasoning effort tier as the latency-sensitive replacement for GPT-4.1.- Specific frontend framework recommendations: Next.js + Tailwind + shadcn + Radix + Motion + Lucide.
- The Tau-Bench Retail and Terminal-Bench example prompts in the appendix.
(For Claude-side equivalents on tool-use formats, see Claude Code CLI reference and Extended Thinking API.)
Cross-vendor decision: when to read the GPT-5 guide
- Always read it once if you author prompts for any reasoning model. Sections on eagerness control, tool preambles, instruction conflicts, and metaprompting are the highest-leverage takeaways and they’re vendor-agnostic.
- Read it again before authoring a new long-rollout agent prompt. Even if you’ll deploy on Opus 4.7, the failure modes GPT-5 exposes (reasoning-token waste on contradictions, over-eager search) are common to all modern reasoning models.
- Skim the Cursor case study before tuning a coding-agent system prompt — the verbosity-split pattern and the don’t-defer-to-the-user rule generalize.
Try It
- Audit a long-running WEO Claude prompt (e.g. the Hermes system prompt) for internal contradictions using GPT-5’s example healthcare prompt as the template — look for “always X” + “always not X” pairs that drift in over edits.
- On a coding skill prompt, try the verbosity split: globally low + explicit-high for code blocks. Compare diff readability.
- On any zero-to-one generation task, try self-rubric prompting: “First, think of a 5–7-category rubric for a world-class result. Don’t show the rubric. Use it internally to iterate. If not hitting top marks, start again.” Compare quality to a non-rubric baseline.
- For your most-painful prompt, try metaprompting: paste the prompt + the desired/undesired behavior gap to Opus 4.7 (or GPT-5) and ask for minimal edits. Iterate.
Open Questions
- The OpenAI guide’s claim that GPT-5 is “extraordinarily receptive to prompt instructions” — does this hold equally for Opus 4.7? Anecdotally yes, but no head-to-head A/B on identical prompts is published.
- Does Opus 4.7’s effort-tier system show measurable gains analogous to the Tau-Bench 73.9% → 78.2% result with the Responses API? Worth a benchmark experiment.
Related
- Claude Prompting Best Practices — Anthropic’s authoritative prompting reference for Opus 4.7 / 4.6 / Sonnet 4.6 / Haiku 4.5; cross-reference for the same techniques on the Anthropic side.
- Troubleshooting Claude — Failure Modes — the contradictory-prompts warning shows up here too.
- Claude Opus 4.7 Best Practices — for the Anthropic-side
effortand adaptive-thinking equivalents. - Claude Extended Thinking API Reference
- Simon Scrapes — Nine-Component Agentic OS — the static-context split is the same pattern as the codebase-rules block.
- Cost & Intelligence Levers — for thinking about effort/verbosity tradeoffs across vendors.
- Prompt Engineering index