Source: raw/Animated_Comedy_Shortfilm_Project_with_AI_Full_Breakdown.md (MattVidPro YouTube full breakdown, youtube.com/watch?v=WiOtj7g39sE)

A one-person breakdown of producing a ~3-minute animated comedy short film almost entirely with Seedance 2.0, with Codex running pre-production, GPT Image 2 generating the 2D art, and ElevenLabs rebuilding every voice line. The value here is the full operator pipeline — named tools, an ordered workflow, a platform cost comparison, and the specific failure modes (plus fixes) you hit when pushing a video model to feature-length consistency. It complements the two existing Seedance articles (LTX route and HeyGen Avatar Shots) without duplicating either — neither covers platform pricing, the Codex pre-production loop, the duplicate-character failure mode, or the ElevenLabs voice-swap step.

Key Takeaways

  • Codex runs pre-production. The Codex CLI brainstorms the story, writes every generation prompt, and calls the OpenAI API with GPT Image 2 to produce backgrounds, character sheets, and prop specs — even building a full HTML reference site for the project. GPT Image 2 beat Nano Banana 2 for the detailed hand-drawn 2D style; story/dialogue was co-written with ChatGPT but heavily hand-edited. Codex was also tasked to research Seedance 2.0 prompting methodology (including from community creators who out-know the model’s authors).
  • Generation runs image-to-video on Seedance 2.0. ~30-50 clips, 10-15s each, up to 1080p, run on Polo AI (the video’s sponsor; Polo also ships an in-house Polo 3.0 model). Seedance is reference-hungry — it works best fed image references. Queue time is the real cost driver: Polo ~3-4 min/generation vs Runway ML ~8-10 min/generation.
  • ~70-80% of streaming-production quality at 20-30 hrs of effort for one person — the creator’s honest self-assessment of where the ceiling currently sits.
  • The duplicate-character bug is the signature Seedance failure mode — and it has a prompt fix (below).
  • Native Seedance dialogue often clones famous voice actors (e.g., a Rick-and-Morty-style voice), so every line gets swapped via ElevenLabs in post.
  • Gemini Omni is the wrong tool for consistent 2D animation — it wins at editing real video but drifts photoreal and breaks character consistency (see comparison below).

The pipeline (ordered)

  1. Pre-production (Codex + GPT Image 2). Codex brainstorms story beats, writes all prompts, and generates backgrounds + character sheets + prop specs via GPT Image 2; assembles an HTML reference site. Story co-written with ChatGPT, heavily edited by hand.
  2. Generation (Seedance 2.0 on Polo AI). ~30-50 image-to-video clips, 10-15s, up to 1080p. Feed references; expect Seedance to lean on them heavily.
  3. Voice replacement (ElevenLabs). Mute native Seedance audio; re-voice every line in ElevenLabs at stability ~40-50% + high similarity, rendering each line as its own audio file (named voices used: “Chuck Miller,” “Finn”). Native voice consistency in Seedance would require re-uploading the prior clip per generation — the creator skipped this to save time and fixed voices in post instead.
  4. Audio finishing. Muting native audio for the voice swap exposes missing ambiance, so re-add royalty-free background ambiance, manually add SFX (footsteps, fish-tank bubbles), and loop clipped Seedance ambiance from select scenes.
  5. Edit/assembly. Splice multiple failed generations together in the NLE timeline and end clips a few frames early, before artifacts appear, to mask remaining errors.

Failure modes and fixes

  • Duplicate / cloned characters. Seedance sees multiple character instances on a character-reference sheet and clones them into the shot. Fix: explicit “singular / one / closeup of a single main character” prompting, plus back-and-forth with Codex to repair the prompts.
  • Useless references. Some references just don’t help — fix: let Seedance conjure the scene unreferenced.
  • Visible artifacts. Fix: splice multiple failed generations in the timeline and clip each a few frames early.
  • Famous-voice-actor mimicry. Native dialogue replicates known voice actors — fix: swap every line in ElevenLabs (see pipeline step 3).

Platform cost comparison

Creator estimates for a ~3-min film ≈ 50 generations × 10s @ 720p ^[inferred — these are the creator’s own estimates, not vendor-published pricing]:

PlatformApprox. costNotes
Polo AI~$109/yrWith a 50%-off Seedance deal; fastest queue (~3-4 min/gen); the sponsor
Open Art~$70Infinite plan only ~half the needed credits → Wonder tier ~$240 upfront
Runway Explore~$76Slowest queue (~8-10 min/gen)
Higgsfield UltracompetitiveCreator dislikes the site UX
fal.aiper-generationPay-as-you-go for the fast model

An Ultra/annual plan is generally required to hit the lowest per-clip cost. See Higgsfield for the Ultra-tier context.

Gemini Omni vs Seedance 2.0 (same prompts)

  • Gemini Omni excels at editing real video — VFX-grade object edits, outpainting — but fails 2D-animation character consistency, drifts toward photoreal, is capped at ~10s, allows only ~2-3 generations/day, and has no generation API.
  • Seedance 2.0 wins for believable, consistent 2D animated film because it’s a true omni model that takes multiple image/video references. ^[inferred — the “true omni model” framing is the creator’s]

Try It

  1. Run pre-production through Codex (or Claude Code): have it write your generation prompts and research current Seedance prompting conventions before you spend a single video credit.
  2. Generate 2D art with GPT Image 2 for detailed hand-drawn styles (it beat Nano Banana 2 here); build a character-reference sheet but prompt for a single character per shot to dodge the clone bug.
  3. Pick the platform by queue speed, not just price — Polo AI’s ~3-4 min/gen beats Runway’s ~8-10 min when you’re running 30-50 clips.
  4. Plan a voice-replacement pass in ElevenLabs (stability ~40-50%, high similarity, one file per line) — assume native Seedance dialogue is unusable for any character with a recognizable voice.
  5. Budget a finishing pass for ambiance + SFX after muting native audio, and edit defensively (splice failed clips, end a few frames early).

Open Questions

  • Exact per-clip pricing is the creator’s estimate, not vendor-published — verify against current Polo AI / Open Art / Runway / fal.ai rate cards before budgeting.
  • The video is a single-creator workflow; the “~70-80% of streaming quality” and “20-30 hrs” figures are self-reported and uncalibrated against other operators.