Source: raw/Animated_Comedy_Shortfilm_Project_with_AI_Full_Breakdown.md (MattVidPro YouTube full breakdown, youtube.com/watch?v=WiOtj7g39sE)
A one-person breakdown of producing a ~3-minute animated comedy short film almost entirely with Seedance 2.0, with Codex running pre-production, GPT Image 2 generating the 2D art, and ElevenLabs rebuilding every voice line. The value here is the full operator pipeline — named tools, an ordered workflow, a platform cost comparison, and the specific failure modes (plus fixes) you hit when pushing a video model to feature-length consistency. It complements the two existing Seedance articles (LTX route and HeyGen Avatar Shots) without duplicating either — neither covers platform pricing, the Codex pre-production loop, the duplicate-character failure mode, or the ElevenLabs voice-swap step.
Key Takeaways
- Codex runs pre-production. The Codex CLI brainstorms the story, writes every generation prompt, and calls the OpenAI API with GPT Image 2 to produce backgrounds, character sheets, and prop specs — even building a full HTML reference site for the project. GPT Image 2 beat Nano Banana 2 for the detailed hand-drawn 2D style; story/dialogue was co-written with ChatGPT but heavily hand-edited. Codex was also tasked to research Seedance 2.0 prompting methodology (including from community creators who out-know the model’s authors).
- Generation runs image-to-video on Seedance 2.0. ~30-50 clips, 10-15s each, up to 1080p, run on Polo AI (the video’s sponsor; Polo also ships an in-house Polo 3.0 model). Seedance is reference-hungry — it works best fed image references. Queue time is the real cost driver: Polo ~3-4 min/generation vs Runway ML ~8-10 min/generation.
- ~70-80% of streaming-production quality at 20-30 hrs of effort for one person — the creator’s honest self-assessment of where the ceiling currently sits.
- The duplicate-character bug is the signature Seedance failure mode — and it has a prompt fix (below).
- Native Seedance dialogue often clones famous voice actors (e.g., a Rick-and-Morty-style voice), so every line gets swapped via ElevenLabs in post.
- Gemini Omni is the wrong tool for consistent 2D animation — it wins at editing real video but drifts photoreal and breaks character consistency (see comparison below).
The pipeline (ordered)
- Pre-production (Codex + GPT Image 2). Codex brainstorms story beats, writes all prompts, and generates backgrounds + character sheets + prop specs via GPT Image 2; assembles an HTML reference site. Story co-written with ChatGPT, heavily edited by hand.
- Generation (Seedance 2.0 on Polo AI). ~30-50 image-to-video clips, 10-15s, up to 1080p. Feed references; expect Seedance to lean on them heavily.
- Voice replacement (ElevenLabs). Mute native Seedance audio; re-voice every line in ElevenLabs at stability ~40-50% + high similarity, rendering each line as its own audio file (named voices used: “Chuck Miller,” “Finn”). Native voice consistency in Seedance would require re-uploading the prior clip per generation — the creator skipped this to save time and fixed voices in post instead.
- Audio finishing. Muting native audio for the voice swap exposes missing ambiance, so re-add royalty-free background ambiance, manually add SFX (footsteps, fish-tank bubbles), and loop clipped Seedance ambiance from select scenes.
- Edit/assembly. Splice multiple failed generations together in the NLE timeline and end clips a few frames early, before artifacts appear, to mask remaining errors.
Failure modes and fixes
- Duplicate / cloned characters. Seedance sees multiple character instances on a character-reference sheet and clones them into the shot. Fix: explicit “singular / one / closeup of a single main character” prompting, plus back-and-forth with Codex to repair the prompts.
- Useless references. Some references just don’t help — fix: let Seedance conjure the scene unreferenced.
- Visible artifacts. Fix: splice multiple failed generations in the timeline and clip each a few frames early.
- Famous-voice-actor mimicry. Native dialogue replicates known voice actors — fix: swap every line in ElevenLabs (see pipeline step 3).
Platform cost comparison
Creator estimates for a ~3-min film ≈ 50 generations × 10s @ 720p ^[inferred — these are the creator’s own estimates, not vendor-published pricing]:
| Platform | Approx. cost | Notes |
|---|---|---|
| Polo AI | ~$109/yr | With a 50%-off Seedance deal; fastest queue (~3-4 min/gen); the sponsor |
| Open Art | ~$70 | Infinite plan only ~half the needed credits → Wonder tier ~$240 upfront |
| Runway Explore | ~$76 | Slowest queue (~8-10 min/gen) |
| Higgsfield Ultra | competitive | Creator dislikes the site UX |
| fal.ai | per-generation | Pay-as-you-go for the fast model |
An Ultra/annual plan is generally required to hit the lowest per-clip cost. See Higgsfield for the Ultra-tier context.
Gemini Omni vs Seedance 2.0 (same prompts)
- Gemini Omni excels at editing real video — VFX-grade object edits, outpainting — but fails 2D-animation character consistency, drifts toward photoreal, is capped at ~10s, allows only ~2-3 generations/day, and has no generation API.
- Seedance 2.0 wins for believable, consistent 2D animated film because it’s a true omni model that takes multiple image/video references. ^[inferred — the “true omni model” framing is the creator’s]
Try It
- Run pre-production through Codex (or Claude Code): have it write your generation prompts and research current Seedance prompting conventions before you spend a single video credit.
- Generate 2D art with GPT Image 2 for detailed hand-drawn styles (it beat Nano Banana 2 here); build a character-reference sheet but prompt for a single character per shot to dodge the clone bug.
- Pick the platform by queue speed, not just price — Polo AI’s ~3-4 min/gen beats Runway’s ~8-10 min when you’re running 30-50 clips.
- Plan a voice-replacement pass in ElevenLabs (stability ~40-50%, high similarity, one file per line) — assume native Seedance dialogue is unusable for any character with a recognizable voice.
- Budget a finishing pass for ambiance + SFX after muting native audio, and edit defensively (splice failed clips, end a few frames early).
Related
- FREE Seedance 2.0 Claude Skill — Multi-Model AI Filmmaking (LTX Studio) — the Claude-Skill route to the same Seedance shot-prompting; this article is the Polo-AI + post-production-fix counterpart
- HeyGen Avatar Shots + Seedance 2.0 — Seedance inside HeyGen with the 5-element cinematic prompt framework
- Higgsfield Overview — alternative generation platform (Ultra tier referenced in the cost table)
- Codex + MagicPath Design Workflow — the Codex CLI pre-production surface used here
- video-use — Claude Code skill for the transcript-driven editing/assembly stage
- OpenCut — open-source NLE for the splice-and-trim finishing pass
Open Questions
- Exact per-clip pricing is the creator’s estimate, not vendor-published — verify against current Polo AI / Open Art / Runway / fal.ai rate cards before budgeting.
- The video is a single-creator workflow; the “~70-80% of streaming quality” and “20-30 hrs” figures are self-reported and uncalibrated against other operators.