Source: ai-research/nano-banana-pro-launch-blog-google-2026-07-03.md, ai-research/nano-banana-pro-developers-blog-google-2026-07-03.md, ai-research/nano-banana-gemini-api-image-generation-docs-2026-07-03.md, ai-research/nano-banana-pro-arstechnica-2026-07-03.md, ai-research/nano-banana-pro-deepmind-model-page-2026-07-03.md, ai-research/gemini-2-5-flash-image-original-nano-banana-2026-07-03.md — plus the head-to-head in GPT Image 2 launch coverage.
Nano Banana is the meme nickname Google kept for its Gemini-native image generation and editing models — the same product line the API calls Gemini … Image. It started as Gemini 2.5 Flash Image (August 2025), which briefly became the top-rated image model in the world, and grew into a family: Nano Banana Pro (Gemini 3 Pro Image, November 2025) is the studio-quality reasoning tier, while Nano Banana 2 / 2 Lite (Gemini 3.1 Flash Image / Flash Lite Image) is the current-generation Flash workhorse. Its defining trick is that image generation now runs on a reasoning model with real-world knowledge — so it renders legible multilingual text, single-shot infographics, and keeps characters consistent, all watermarked with SynthID.
Key Takeaways
- “Nano Banana” is a brand, not one model. It covers Gemini’s native image models. The API lineup as of 2026-07-03: Nano Banana (
gemini-2.5-flash-image, the “legacy pioneer”), Nano Banana 2 Lite (gemini-3.1-flash-lite-image, fastest/cheapest, 1K only), Nano Banana 2 (gemini-3.1-flash-image, the versatile 4K workhorse), and Nano Banana Pro (gemini-3-pro-image, the premium tier for the most complex visual tasks). - Nano Banana Pro is built on Gemini 3 Pro. Announced November 2025, it inherits Gemini 3’s reasoning and world knowledge — Google frames it as turning notes into diagrams, data into infographics, and prompts into “studio-quality” designs.
- Text rendering in-image is the headline capability. It is Google’s best model for correctly rendered, legible text directly inside an image — short taglines to full paragraphs, varied fonts/textures/calligraphy, and multiple languages (generate, localize, or translate in place). This is what makes single-shot infographics, posters, and mockups usable.
- Multi-image blending + character consistency. Nano Banana Pro can mix up to 14 reference images and maintain the likeness of up to 5 people in a single output; the Flash tier holds up to ~4 characters / ~10 high-fidelity objects.^[inferred — the per-tier object/character fidelity split is drawn from the Gemini API docs table; the “14 images / 5 people” headline is stated directly by Google and Ars Technica.]
- World knowledge + Google Search grounding. When enabled, it pulls real-time web data (weather, sports, recipes) into factually accurate visuals — biological diagrams, historical maps, data-driven infographics.
- Up to 4K, with studio controls. Native 1K, plus 2K and 4K output and multiple aspect ratios; localized editing (select/refine/transform any region), camera-angle changes, focus, color grading, and scene-lighting transforms (day-to-night, bokeh).
- SynthID + C2PA provenance. Every image carries an imperceptible SynthID watermark; C2PA metadata is added; a visible “Gemini sparkle” watermark sits on free/Pro-tier images but is removed for AI Ultra subscribers. You can upload any image to the Gemini app and ask “Is this AI?” to check for SynthID.
- Everywhere in Google’s stack. Gemini app, AI Mode in Search (US, Pro/Ultra), NotebookLM, Google Ads, Google Workspace (Slides, Vids), the Gemini API + AI Studio, Google Antigravity, and Vertex AI — plus Adobe and Figma partner integrations.
Capabilities
- Reasoning-first generation. Unlike diffusion models that render directly, Nano Banana Pro leans on Gemini 3’s reasoning before it draws — the same shift toward “thinking” image models this wiki tracks for GPT Image 2. Practical effect: terser prompts, fewer factual errors, correct text.
- Legible multilingual text in-image. The differentiator versus older image models that turned text into “alphabet soup.” Best-practice from the API docs: generate the text first, then ask for the image containing it.
- Blend up to 14 references. Combine subjects, styles, products, and scenes; keep up to 5 people consistent across outputs — useful for brand touchpoints, storyboards, and multi-panel sequences.
- Search-grounded, world-knowledge visuals. Optional grounding with Google Search produces data-correct infographics and real-time snapshots; strong general world knowledge produces plausible diagrams without grounding.
- Resolution + creative controls. 1K/2K/4K (Flash Lite is 1K-only; Flash adds a 512px/0.5K option), a wide aspect-ratio set, localized edits, and physical controls over camera, focus, color grading, and lighting.
- Provenance stack. SynthID (invisible, survives crops/filters/compression per Google), added C2PA “Content Credentials” metadata, and in-app SynthID verification.
- Access model. Consumers use it in the Gemini app by selecting “Create images” with the Thinking model (free users get a limited quota, then fall back to the original Nano Banana). Developers get it as a paid preview in the Gemini API + AI Studio; image output is usage-metered (Google bills generated images as a block of output tokens) — check the current Gemini API / Vertex AI pricing pages for rates.^[inferred — exact per-image pricing was not in the cited Google announcement/doc sources; only the “paid preview” and token-metered framing are stated there.]
- SDK shape. The original Nano Banana ships in the
google-genaiPython/JS SDK (generate_contentwith the image model, prompt + optional input image); the Gemini 3 image models add aninteractionsAPI path and animage_size/response_formatcontrol for resolution.
Nano Banana vs. GPT Image 2
The two vendor flagships this wiki tracks are close cousins — both are reasoning/“thinking” image models with real-time web grounding, legible multilingual text, multi-image character consistency, and 2K/4K output. They leapfrog each other release to release, so any single benchmark is a time-stamped snapshot.
| Dimension | Nano Banana Pro (Gemini 3 Pro Image) | GPT Image 2 (OpenAI) |
|---|---|---|
| Base model | Gemini 3 Pro | OpenAI image model (codename “duct-tape”) |
| Distribution | Gemini app, Search AI Mode, Workspace, Google Ads, NotebookLM, Antigravity, Vertex AI, API/AI Studio | ChatGPT + API + Higgsfield + Codex CLI image_generation |
| Signature strength | In-image text, Search-grounded factual infographics, 14-image blend / 5-person consistency | Multi-image batch (magazines, manga, room-by-room), the photorealism keyword unlock |
| Provenance | SynthID (all outputs) + C2PA + in-app “Is this AI?” | Not documented in this wiki’s GPT Image 2 coverage |
| Ecosystem partners | Adobe, Figma | Raycast, Higgsfield, Codex |
- LM Arena (competitor-reported). Per this wiki’s GPT Image 2 launch coverage (OpenAI-side sources), GPT Image 2 took the text-to-image #1 spot at ~1512 ELO — a 250+ jump over the prior leader, Nano Banana 2 (Gemini 3.1 Flash Image) at ~1270. Treat as a time-stamped, one-sided ranking.
- Character consistency head-to-head. The same 50-example field test found GPT Image 2 held strict identity detail (tattoos, piercings, hairstyle across six references) where Nano Banana Pro drifted; Nano Banana Pro trended “more cinematic,” GPT Image 2 “more editorial.” Both handled multi-reference composition well.
- Where Nano Banana pulls ahead is reach and provenance: it is wired into the Google products people already use (Search, Workspace, Ads, NotebookLM), grounds on Google Search for real-time facts, and ships a SynthID + C2PA transparency stack on every output.^[inferred — this is a synthesis of the differing feature emphases in the two vendors’ own materials, not a claim either vendor makes about the other.]
How This Wiki’s Tutorials Use It
Every Nano Banana reference in this wiki sits inside a video / storyboard / creative workflow, and in each case it’s reached through a third-party surface (LTX Studio or Higgsfield MCP), offered side-by-side with GPT Image 2, rather than through Google’s own API:
- FREE Seedance 2.0 Claude Skill (LTX Studio) — the free Claude Skill writes prompts for Nano Banana Pro, GPT Image 2, and Seedance 2 in each model’s own prompt language. Nano Banana Pro / GPT Image 2 handle the character-sheet stage that locks character consistency before the storyboard grid and Seedance shot.
- Higgsfield MCP Tutorial (Robo Nuggets) — a single Claude thread generates side-by-side brand books from Nano Banana 2 vs GPT Image 2 (“create two iterations — Nano Banana 2 for one, GPT Image 2 for the other”), then feeds them into a 6-panel logo-animation storyboard and Seedance 2.0 video. Multi-model side-by-side is the killer use case.
- Higgsfield 50-Ad Campaign — batches product creative on Nano Banana 2 (products) alongside Soul 2 (humans) inside a 50-image Instagram campaign run.
- Animated Short Film Pipeline — Nano Banana 2 was a candidate for the hand-drawn 2D art; here GPT Image 2 won the specific style bake-off, a useful reminder to test both per job.
- Higgsfield exposes the Nano Banana family as one of 16+ image models in its MCP/API surface, so any Higgsfield storyboard or product-photoshoot step can route to it.
The pattern: Nano Banana is the Google image engine an agentic workflow prompts for character sheets, brand books, and product stills — usually A/B’d against GPT Image 2 in the same run.
Try It
- Fastest path (free): open the Gemini app, choose “Create images,” pick the Thinking model, and prompt for something text-heavy — a poster, a labeled infographic, a mockup with a real headline. Text rendering is the quickest way to feel the difference from older models.
- Test the 14-image blend. Upload several reference photos (a person, a product, a location) and ask for one composited scene; check whether identity and product details hold.
- Ground it on Search. Ask for an infographic on a real, current topic (this week’s weather, a recipe, a historical map) with Search grounding on, and verify the facts in the image.
- Verify provenance. Generate an image, then upload it back into the Gemini app and ask “Is this AI-generated?” to see SynthID detection in action.
- Wire it into a storyboard workflow. Use the Seedance Claude Skill or Higgsfield MCP to prompt Nano Banana Pro for a character sheet, then A/B it against GPT Image 2 on the same reference.
- For developers: try it in Google AI Studio (paid preview) or Vertex AI; use
image_sizeto request 2K/4K and expect a SynthID watermark on every output.
Related
- GPT Image 2 launch coverage — the OpenAI counterpart; the head-to-head this article compares against.
- ChatGPT Image (GPT Image 2) — the sibling vendor-image-model topic; Nano Banana is its Google counterpart.
- FREE Seedance 2.0 Claude Skill — prompts Nano Banana Pro for the character-sheet stage.
- Higgsfield MCP Tutorial (Robo Nuggets) — Nano Banana 2 vs GPT Image 2 brand books.
- Animated Short Film Pipeline — GPT Image 2 vs Nano Banana 2 for 2D art.
- Higgsfield Overview — surfaces the Nano Banana family as a model backend.
- awesome-gpt-image-2 — the GPT-Image prompt-library counterpart (Nano Banana has an analogous “Awesome-Nano-Banana” community repo).
- AI Marketing — marketing-asset workflows that benefit from in-image text + 4K + multi-image blending.
- Image Models Feed the Video Pipeline — the still-first pipeline where this image model front-ends into Seedance, A/B’d against GPT Image 2.
Open Questions
- Exact developer pricing. The cited Google sources confirm a usage-metered paid preview but not per-image rates; third-party API catalogs quote figures that were not confirmed against official Gemini API / Vertex AI pricing.
- GA vs preview status. Nano Banana Pro launched in paid preview; the precise general-availability timeline across API / Vertex AI is not pinned down in the cited sources.
- Video/audio provenance. SynthID verification in the Gemini app started with images and English prompts; audio and video verification were flagged as “coming soon.”
- Nano Banana 2 vs Nano Banana Pro selection. When to prefer the Flash-tier Nano Banana 2 (speed/cost, 4K) over Pro (max world knowledge, brand consistency, 5-person fidelity) for a given storyboard job is workflow-dependent and not benchmarked here.