Source: ai-research/nano-banana-pro-launch-blog-google-2026-07-03.md, ai-research/nano-banana-pro-developers-blog-google-2026-07-03.md, ai-research/nano-banana-gemini-api-image-generation-docs-2026-07-03.md, ai-research/nano-banana-pro-arstechnica-2026-07-03.md, ai-research/nano-banana-pro-deepmind-model-page-2026-07-03.md, ai-research/gemini-2-5-flash-image-original-nano-banana-2026-07-03.md — plus the head-to-head in GPT Image 2 launch coverage.

Nano Banana is the meme nickname Google kept for its Gemini-native image generation and editing models — the same product line the API calls Gemini … Image. It started as Gemini 2.5 Flash Image (August 2025), which briefly became the top-rated image model in the world, and grew into a family: Nano Banana Pro (Gemini 3 Pro Image, November 2025) is the studio-quality reasoning tier, while Nano Banana 2 / 2 Lite (Gemini 3.1 Flash Image / Flash Lite Image) is the current-generation Flash workhorse. Its defining trick is that image generation now runs on a reasoning model with real-world knowledge — so it renders legible multilingual text, single-shot infographics, and keeps characters consistent, all watermarked with SynthID.

Key Takeaways

  • “Nano Banana” is a brand, not one model. It covers Gemini’s native image models. The API lineup as of 2026-07-03: Nano Banana (gemini-2.5-flash-image, the “legacy pioneer”), Nano Banana 2 Lite (gemini-3.1-flash-lite-image, fastest/cheapest, 1K only), Nano Banana 2 (gemini-3.1-flash-image, the versatile 4K workhorse), and Nano Banana Pro (gemini-3-pro-image, the premium tier for the most complex visual tasks).
  • Nano Banana Pro is built on Gemini 3 Pro. Announced November 2025, it inherits Gemini 3’s reasoning and world knowledge — Google frames it as turning notes into diagrams, data into infographics, and prompts into “studio-quality” designs.
  • Text rendering in-image is the headline capability. It is Google’s best model for correctly rendered, legible text directly inside an image — short taglines to full paragraphs, varied fonts/textures/calligraphy, and multiple languages (generate, localize, or translate in place). This is what makes single-shot infographics, posters, and mockups usable.
  • Multi-image blending + character consistency. Nano Banana Pro can mix up to 14 reference images and maintain the likeness of up to 5 people in a single output; the Flash tier holds up to ~4 characters / ~10 high-fidelity objects.^[inferred — the per-tier object/character fidelity split is drawn from the Gemini API docs table; the “14 images / 5 people” headline is stated directly by Google and Ars Technica.]
  • World knowledge + Google Search grounding. When enabled, it pulls real-time web data (weather, sports, recipes) into factually accurate visuals — biological diagrams, historical maps, data-driven infographics.
  • Up to 4K, with studio controls. Native 1K, plus 2K and 4K output and multiple aspect ratios; localized editing (select/refine/transform any region), camera-angle changes, focus, color grading, and scene-lighting transforms (day-to-night, bokeh).
  • SynthID + C2PA provenance. Every image carries an imperceptible SynthID watermark; C2PA metadata is added; a visible “Gemini sparkle” watermark sits on free/Pro-tier images but is removed for AI Ultra subscribers. You can upload any image to the Gemini app and ask “Is this AI?” to check for SynthID.
  • Everywhere in Google’s stack. Gemini app, AI Mode in Search (US, Pro/Ultra), NotebookLM, Google Ads, Google Workspace (Slides, Vids), the Gemini API + AI Studio, Google Antigravity, and Vertex AI — plus Adobe and Figma partner integrations.

Capabilities

  • Reasoning-first generation. Unlike diffusion models that render directly, Nano Banana Pro leans on Gemini 3’s reasoning before it draws — the same shift toward “thinking” image models this wiki tracks for GPT Image 2. Practical effect: terser prompts, fewer factual errors, correct text.
  • Legible multilingual text in-image. The differentiator versus older image models that turned text into “alphabet soup.” Best-practice from the API docs: generate the text first, then ask for the image containing it.
  • Blend up to 14 references. Combine subjects, styles, products, and scenes; keep up to 5 people consistent across outputs — useful for brand touchpoints, storyboards, and multi-panel sequences.
  • Search-grounded, world-knowledge visuals. Optional grounding with Google Search produces data-correct infographics and real-time snapshots; strong general world knowledge produces plausible diagrams without grounding.
  • Resolution + creative controls. 1K/2K/4K (Flash Lite is 1K-only; Flash adds a 512px/0.5K option), a wide aspect-ratio set, localized edits, and physical controls over camera, focus, color grading, and lighting.
  • Provenance stack. SynthID (invisible, survives crops/filters/compression per Google), added C2PA “Content Credentials” metadata, and in-app SynthID verification.
  • Access model. Consumers use it in the Gemini app by selecting “Create images” with the Thinking model (free users get a limited quota, then fall back to the original Nano Banana). Developers get it as a paid preview in the Gemini API + AI Studio; image output is usage-metered (Google bills generated images as a block of output tokens) — check the current Gemini API / Vertex AI pricing pages for rates.^[inferred — exact per-image pricing was not in the cited Google announcement/doc sources; only the “paid preview” and token-metered framing are stated there.]
  • SDK shape. The original Nano Banana ships in the google-genai Python/JS SDK (generate_content with the image model, prompt + optional input image); the Gemini 3 image models add an interactions API path and an image_size / response_format control for resolution.

Nano Banana vs. GPT Image 2

The two vendor flagships this wiki tracks are close cousins — both are reasoning/“thinking” image models with real-time web grounding, legible multilingual text, multi-image character consistency, and 2K/4K output. They leapfrog each other release to release, so any single benchmark is a time-stamped snapshot.

DimensionNano Banana Pro (Gemini 3 Pro Image)GPT Image 2 (OpenAI)
Base modelGemini 3 ProOpenAI image model (codename “duct-tape”)
DistributionGemini app, Search AI Mode, Workspace, Google Ads, NotebookLM, Antigravity, Vertex AI, API/AI StudioChatGPT + API + Higgsfield + Codex CLI image_generation
Signature strengthIn-image text, Search-grounded factual infographics, 14-image blend / 5-person consistencyMulti-image batch (magazines, manga, room-by-room), the photorealism keyword unlock
ProvenanceSynthID (all outputs) + C2PA + in-app “Is this AI?”Not documented in this wiki’s GPT Image 2 coverage
Ecosystem partnersAdobe, FigmaRaycast, Higgsfield, Codex
  • LM Arena (competitor-reported). Per this wiki’s GPT Image 2 launch coverage (OpenAI-side sources), GPT Image 2 took the text-to-image #1 spot at ~1512 ELO — a 250+ jump over the prior leader, Nano Banana 2 (Gemini 3.1 Flash Image) at ~1270. Treat as a time-stamped, one-sided ranking.
  • Character consistency head-to-head. The same 50-example field test found GPT Image 2 held strict identity detail (tattoos, piercings, hairstyle across six references) where Nano Banana Pro drifted; Nano Banana Pro trended “more cinematic,” GPT Image 2 “more editorial.” Both handled multi-reference composition well.
  • Where Nano Banana pulls ahead is reach and provenance: it is wired into the Google products people already use (Search, Workspace, Ads, NotebookLM), grounds on Google Search for real-time facts, and ships a SynthID + C2PA transparency stack on every output.^[inferred — this is a synthesis of the differing feature emphases in the two vendors’ own materials, not a claim either vendor makes about the other.]

How This Wiki’s Tutorials Use It

Every Nano Banana reference in this wiki sits inside a video / storyboard / creative workflow, and in each case it’s reached through a third-party surface (LTX Studio or Higgsfield MCP), offered side-by-side with GPT Image 2, rather than through Google’s own API:

  • FREE Seedance 2.0 Claude Skill (LTX Studio) — the free Claude Skill writes prompts for Nano Banana Pro, GPT Image 2, and Seedance 2 in each model’s own prompt language. Nano Banana Pro / GPT Image 2 handle the character-sheet stage that locks character consistency before the storyboard grid and Seedance shot.
  • Higgsfield MCP Tutorial (Robo Nuggets) — a single Claude thread generates side-by-side brand books from Nano Banana 2 vs GPT Image 2 (“create two iterations — Nano Banana 2 for one, GPT Image 2 for the other”), then feeds them into a 6-panel logo-animation storyboard and Seedance 2.0 video. Multi-model side-by-side is the killer use case.
  • Higgsfield 50-Ad Campaign — batches product creative on Nano Banana 2 (products) alongside Soul 2 (humans) inside a 50-image Instagram campaign run.
  • Animated Short Film Pipeline — Nano Banana 2 was a candidate for the hand-drawn 2D art; here GPT Image 2 won the specific style bake-off, a useful reminder to test both per job.
  • Higgsfield exposes the Nano Banana family as one of 16+ image models in its MCP/API surface, so any Higgsfield storyboard or product-photoshoot step can route to it.

The pattern: Nano Banana is the Google image engine an agentic workflow prompts for character sheets, brand books, and product stills — usually A/B’d against GPT Image 2 in the same run.

Try It

  1. Fastest path (free): open the Gemini app, choose “Create images,” pick the Thinking model, and prompt for something text-heavy — a poster, a labeled infographic, a mockup with a real headline. Text rendering is the quickest way to feel the difference from older models.
  2. Test the 14-image blend. Upload several reference photos (a person, a product, a location) and ask for one composited scene; check whether identity and product details hold.
  3. Ground it on Search. Ask for an infographic on a real, current topic (this week’s weather, a recipe, a historical map) with Search grounding on, and verify the facts in the image.
  4. Verify provenance. Generate an image, then upload it back into the Gemini app and ask “Is this AI-generated?” to see SynthID detection in action.
  5. Wire it into a storyboard workflow. Use the Seedance Claude Skill or Higgsfield MCP to prompt Nano Banana Pro for a character sheet, then A/B it against GPT Image 2 on the same reference.
  6. For developers: try it in Google AI Studio (paid preview) or Vertex AI; use image_size to request 2K/4K and expect a SynthID watermark on every output.

Open Questions

  • Exact developer pricing. The cited Google sources confirm a usage-metered paid preview but not per-image rates; third-party API catalogs quote figures that were not confirmed against official Gemini API / Vertex AI pricing.
  • GA vs preview status. Nano Banana Pro launched in paid preview; the precise general-availability timeline across API / Vertex AI is not pinned down in the cited sources.
  • Video/audio provenance. SynthID verification in the Gemini app started with images and English prompts; audio and video verification were flagged as “coming soon.”
  • Nano Banana 2 vs Nano Banana Pro selection. When to prefer the Flash-tier Nano Banana 2 (speed/cost, 4K) over Pro (max world knowledge, brand consistency, 5-person fidelity) for a given storyboard job is workflow-dependent and not benchmarked here.