Source: raw/HeyGen + Claude Code.pdf, raw/heygen yt transcript.rtf, raw/heygen-studio-template/ Author: Nate Herk (AI Automation Society) Platform: YouTube tutorial + companion PDF + open-source template

Claude Code orchestrates a three-tool production pipeline — ElevenLabs for voice cloning, HeyGen for avatar video, and Remotion for motion graphics — turning raw scripts into finished edited videos overnight without human intervention. The template project provides a ready-to-fork Python codebase with resumable state tracking.

Key Takeaways

  • Three-tool stack: ElevenLabs (voice clone) + HeyGen (avatar video) + Remotion (motion graphics), orchestrated end to end by Claude Code
  • Transforms a 5-hour pipeline (1hr recording + 3-4hr editing) into an overnight unattended job
  • The bottleneck shifts from production (recording, editing, post) to thinking (scripts, strategy, ideas)
  • Avatar V is trained on 10M+ data points; creates a digital twin from a 15-second webcam clip
  • Avatar V is not yet available via API — the template uses a Playwright workaround to upgrade Avatar 4 generations to Avatar 5 through the HeyGen dashboard
  • ElevenLabs voice quality degrades after ~60 seconds of audio; optimal chunk size is 45-60 seconds
  • HeyGen caps Avatar 5 generations at 3 minutes per clip; scripts must be chunked at sentence boundaries
  • Scripts must split at sentence boundaries so stitched clips have seamless transitions

Three Shifts

  1. The avatar crossed the uncanny valley. Avatar V learns how you specifically move and gesture, not just lip-sync. Creates digital twin from 15-second webcam clip.
  2. AI can orchestrate the entire production pipeline. Claude Code coordinates multiple tools end to end — an AI agent replacing 3-4 human roles (camera, AV, editor, talent).
  3. The bottleneck moved. Production is no longer the constraint. The new bottleneck is the script, the strategy, the ideas. The human stays in the loop where it matters most.

Six-Stage Pipeline

The template (generate_videos.py) processes lessons through six sequential stages per chunk:

  1. Load Script — read from local .txt file or export Google Doc via gws CLI
  2. Split — sentence-boundary splitting into chunks (configurable, default 200 words)
  3. Generate Audio — ElevenLabs TTS API produces MP3
  4. Upload to HeyGen — POST audio to HeyGen’s asset API
  5. Create Video — POST to HeyGen’s video generation API with avatar + audio
  6. Poll and Download — poll status every 30s until complete, download MP4

State is tracked in state.json so the pipeline is fully resumable — stop and restart without losing work.

Avatar V Playwright Workaround

Avatar V is not available via API. The template works around this with heygen_update.py:

  1. generate_videos.py creates all clips using Avatar 4 via API
  2. heygen_update.py opens the HeyGen dashboard via Playwright
  3. For each clip, it clicks “New Revision,” switches to Avatar 5, and regenerates
  4. redownload_videos.py fetches the upgraded versions

This workaround will be removed when HeyGen adds Avatar V to their API.

Implementation

Tool/Service: HeyGen + ElevenLabs + Remotion + Claude Code Setup:

  1. Clone the heygen-studio-template/ project
  2. pip install -r requirements.txt
  3. Copy .env.example to .env, add HeyGen and ElevenLabs API keys
  4. Edit config.json with your avatar ID and ElevenLabs voice ID
  5. Add lesson scripts to config.json (local files or Google Docs)
  6. Run python generate_videos.py --dry-run to test without API calls
  7. Run python generate_videos.py --lesson 1.0 --max-parts 1 to generate first video

Cost:

  • HeyGen Creator plan: $29/month (unlimited standard avatar videos, 1080p, no watermark)
  • ElevenLabs Creator plan: $22/month (~100 minutes of audio)
  • Claude Code: 200/month (orchestrates pipeline)
  • HeyGen API overhead: ~$4/min of generated footage
  • Total monthly base: 251/month + API usage
  • For a 10-minute video via API: roughly $40 in HeyGen API credits

Cost comparison (human alternative):

  • Freelance video editor: 300 per 10-min video
  • Professional voiceover: 1,000+ per session
  • Studio recording: 200/hour for the room alone
  • One professionally produced YouTube video: 1,000+
  • The entire AI stack costs less per month than a single human-produced video

Integration notes:

  • Videos process in batches of 3 (configurable concurrent_limit) to respect HeyGen rate limits
  • Audio chunks capped at 65 seconds (max_audio_duration_sec) to prevent quality degradation
  • Voice settings tunable: stability 0.40, similarity 0.75, speed 1.03 (Nate’s optimized defaults)

Market Context

  • 91% of businesses use video as a marketing tool (Wyzowl 2026)
  • 67% of non-video marketers plan to start in 2026 (Wyzowl 2026)
  • 49% of founders say AI saves them 6+ hours/week (Lenny’s Newsletter)
  • HeyGen hit $100M ARR in October 2025 (ARR Club, Sacra)
  • HeyGen: 40,000+ paying business customers as of June 2024
  • AI avatar market: 5.93B by 2032 (33% CAGR, MarketsandMarkets)
  • 24% say video is too expensive (Wyzowl 2026) — this pipeline addresses that barrier

Common Objections

  • “That’s fake / not authentic” — The script is yours, voice is yours, face is yours. Best use cases: short-form content, course material, advertisements. Not intended for replacing personal YouTube presence.
  • “Won’t this flood the internet with garbage?” — AI writing tools already exist and the best content still wins. Quality filter is the idea, not the production.
  • “Will this kill video editor jobs?” — Changes the job, not eliminates it. Editors become AI orchestrators and quality reviewers. Same pattern as Canva with graphic designers.

Try It

  1. Create a HeyGen avatar (15-second webcam recording gets you Avatar V)
  2. Clone an ElevenLabs professional voice (minimum 30 min of audio samples; Nate used 2 hours)
  3. Fork the heygen-studio-template/ project and configure your API keys
  4. Start with --dry-run to verify script splitting, then --max-parts 1 for a single test clip
  5. Consider building a Claude Code Skill to wrap the full pipeline into a single command
  • HeyGen Avatar V — The underlying model powering this pipeline; covers the technical capabilities
  • Remotion Motion Graphics — The editing layer that adds motion graphics to avatar clips
  • HeyGen Hyperframes — HeyGen’s HTML-based composition framework, complementary to Studio output
  • video-use (browser-use) — Complementary: HeyGen Studio Automation generates video from scripts; video-use edits existing footage. Stacking them (generate → polish) is the natural full-pipeline combination.
  • Claude Code Routines — The pipeline could run as a cloud routine for fully unattended production

Open Questions

  • When will HeyGen add Avatar V to their API, eliminating the Playwright workaround?
  • Can the pipeline run as a Claude Code Routine (cloud) rather than locally?
  • What is the quality ceiling for ElevenLabs professional voice clones with 2+ hours of training data vs instant clones?
  • How does this pipeline integrate with Remotion for final editing? (Nate plans a separate Remotion breakdown)