Source: raw/HeyGen + Claude Code.pdf, raw/heygen yt transcript.rtf, raw/heygen-studio-template/ Author: Nate Herk (AI Automation Society) Platform: YouTube tutorial + companion PDF + open-source template
Claude Code orchestrates a three-tool production pipeline — ElevenLabs for voice cloning, HeyGen for avatar video, and Remotion for motion graphics — turning raw scripts into finished edited videos overnight without human intervention. The template project provides a ready-to-fork Python codebase with resumable state tracking.
Key Takeaways
- Three-tool stack: ElevenLabs (voice clone) + HeyGen (avatar video) + Remotion (motion graphics), orchestrated end to end by Claude Code
- Transforms a 5-hour pipeline (1hr recording + 3-4hr editing) into an overnight unattended job
- The bottleneck shifts from production (recording, editing, post) to thinking (scripts, strategy, ideas)
- Avatar V is trained on 10M+ data points; creates a digital twin from a 15-second webcam clip
- Avatar V is not yet available via API — the template uses a Playwright workaround to upgrade Avatar 4 generations to Avatar 5 through the HeyGen dashboard
- ElevenLabs voice quality degrades after ~60 seconds of audio; optimal chunk size is 45-60 seconds
- HeyGen caps Avatar 5 generations at 3 minutes per clip; scripts must be chunked at sentence boundaries
- Scripts must split at sentence boundaries so stitched clips have seamless transitions
Three Shifts
- The avatar crossed the uncanny valley. Avatar V learns how you specifically move and gesture, not just lip-sync. Creates digital twin from 15-second webcam clip.
- AI can orchestrate the entire production pipeline. Claude Code coordinates multiple tools end to end — an AI agent replacing 3-4 human roles (camera, AV, editor, talent).
- The bottleneck moved. Production is no longer the constraint. The new bottleneck is the script, the strategy, the ideas. The human stays in the loop where it matters most.
Six-Stage Pipeline
The template (generate_videos.py) processes lessons through six sequential stages per chunk:
- Load Script — read from local
.txtfile or export Google Doc viagwsCLI - Split — sentence-boundary splitting into chunks (configurable, default 200 words)
- Generate Audio — ElevenLabs TTS API produces MP3
- Upload to HeyGen — POST audio to HeyGen’s asset API
- Create Video — POST to HeyGen’s video generation API with avatar + audio
- Poll and Download — poll status every 30s until complete, download MP4
State is tracked in state.json so the pipeline is fully resumable — stop and restart without losing work.
Avatar V Playwright Workaround
Avatar V is not available via API. The template works around this with heygen_update.py:
generate_videos.pycreates all clips using Avatar 4 via APIheygen_update.pyopens the HeyGen dashboard via Playwright- For each clip, it clicks “New Revision,” switches to Avatar 5, and regenerates
redownload_videos.pyfetches the upgraded versions
This workaround will be removed when HeyGen adds Avatar V to their API.
Implementation
Tool/Service: HeyGen + ElevenLabs + Remotion + Claude Code Setup:
- Clone the
heygen-studio-template/project pip install -r requirements.txt- Copy
.env.exampleto.env, add HeyGen and ElevenLabs API keys - Edit
config.jsonwith your avatar ID and ElevenLabs voice ID - Add lesson scripts to
config.json(local files or Google Docs) - Run
python generate_videos.py --dry-runto test without API calls - Run
python generate_videos.py --lesson 1.0 --max-parts 1to generate first video
Cost:
- HeyGen Creator plan: $29/month (unlimited standard avatar videos, 1080p, no watermark)
- ElevenLabs Creator plan: $22/month (~100 minutes of audio)
- Claude Code: 200/month (orchestrates pipeline)
- HeyGen API overhead: ~$4/min of generated footage
- Total monthly base: 251/month + API usage
- For a 10-minute video via API: roughly $40 in HeyGen API credits
Cost comparison (human alternative):
- Freelance video editor: 300 per 10-min video
- Professional voiceover: 1,000+ per session
- Studio recording: 200/hour for the room alone
- One professionally produced YouTube video: 1,000+
- The entire AI stack costs less per month than a single human-produced video
Integration notes:
- Videos process in batches of 3 (configurable
concurrent_limit) to respect HeyGen rate limits - Audio chunks capped at 65 seconds (
max_audio_duration_sec) to prevent quality degradation - Voice settings tunable: stability 0.40, similarity 0.75, speed 1.03 (Nate’s optimized defaults)
Market Context
- 91% of businesses use video as a marketing tool (Wyzowl 2026)
- 67% of non-video marketers plan to start in 2026 (Wyzowl 2026)
- 49% of founders say AI saves them 6+ hours/week (Lenny’s Newsletter)
- HeyGen hit $100M ARR in October 2025 (ARR Club, Sacra)
- HeyGen: 40,000+ paying business customers as of June 2024
- AI avatar market: 5.93B by 2032 (33% CAGR, MarketsandMarkets)
- 24% say video is too expensive (Wyzowl 2026) — this pipeline addresses that barrier
Common Objections
- “That’s fake / not authentic” — The script is yours, voice is yours, face is yours. Best use cases: short-form content, course material, advertisements. Not intended for replacing personal YouTube presence.
- “Won’t this flood the internet with garbage?” — AI writing tools already exist and the best content still wins. Quality filter is the idea, not the production.
- “Will this kill video editor jobs?” — Changes the job, not eliminates it. Editors become AI orchestrators and quality reviewers. Same pattern as Canva with graphic designers.
Try It
- Create a HeyGen avatar (15-second webcam recording gets you Avatar V)
- Clone an ElevenLabs professional voice (minimum 30 min of audio samples; Nate used 2 hours)
- Fork the
heygen-studio-template/project and configure your API keys - Start with
--dry-runto verify script splitting, then--max-parts 1for a single test clip - Consider building a Claude Code Skill to wrap the full pipeline into a single command
Related
- HeyGen Avatar V — The underlying model powering this pipeline; covers the technical capabilities
- Remotion Motion Graphics — The editing layer that adds motion graphics to avatar clips
- HeyGen Hyperframes — HeyGen’s HTML-based composition framework, complementary to Studio output
- video-use (browser-use) — Complementary: HeyGen Studio Automation generates video from scripts; video-use edits existing footage. Stacking them (generate → polish) is the natural full-pipeline combination.
- Claude Code Routines — The pipeline could run as a cloud routine for fully unattended production
Open Questions
- When will HeyGen add Avatar V to their API, eliminating the Playwright workaround?
- Can the pipeline run as a Claude Code Routine (cloud) rather than locally?
- What is the quality ceiling for ElevenLabs professional voice clones with 2+ hours of training data vs instant clones?
- How does this pipeline integrate with Remotion for final editing? (Nate plans a separate Remotion breakdown)