HeyGen Studio Automation with Claude Code

Source: raw/HeyGen + Claude Code.pdf, raw/heygen yt transcript.rtf, raw/heygen-studio-template/ Author: Nate Herk (AI Automation Society) Platform: YouTube tutorial + companion PDF + open-source template

Claude Code orchestrates a three-tool production pipeline — ElevenLabs for voice cloning, HeyGen for avatar video, and Remotion for motion graphics — turning raw scripts into finished edited videos overnight without human intervention. The template project provides a ready-to-fork Python codebase with resumable state tracking.

Key Takeaways

Three-tool stack: ElevenLabs (voice clone) + HeyGen (avatar video) + Remotion (motion graphics), orchestrated end to end by Claude Code
Transforms a 5-hour pipeline (1hr recording + 3-4hr editing) into an overnight unattended job
The bottleneck shifts from production (recording, editing, post) to thinking (scripts, strategy, ideas)
Avatar V is trained on 10M+ data points; creates a digital twin from a 15-second webcam clip
Avatar V is not yet available via API — the template uses a Playwright workaround to upgrade Avatar 4 generations to Avatar 5 through the HeyGen dashboard
ElevenLabs voice quality degrades after ~60 seconds of audio; optimal chunk size is 45-60 seconds
HeyGen caps Avatar 5 generations at 3 minutes per clip; scripts must be chunked at sentence boundaries
Scripts must split at sentence boundaries so stitched clips have seamless transitions

Three Shifts

The avatar crossed the uncanny valley. Avatar V learns how you specifically move and gesture, not just lip-sync. Creates digital twin from 15-second webcam clip.
AI can orchestrate the entire production pipeline. Claude Code coordinates multiple tools end to end — an AI agent replacing 3-4 human roles (camera, AV, editor, talent).
The bottleneck moved. Production is no longer the constraint. The new bottleneck is the script, the strategy, the ideas. The human stays in the loop where it matters most.

Six-Stage Pipeline

The template (generate_videos.py) processes lessons through six sequential stages per chunk:

Load Script — read from local .txt file or export Google Doc via gws CLI
Split — sentence-boundary splitting into chunks (configurable, default 200 words)
Generate Audio — ElevenLabs TTS API produces MP3
Upload to HeyGen — POST audio to HeyGen’s asset API
Create Video — POST to HeyGen’s video generation API with avatar + audio
Poll and Download — poll status every 30s until complete, download MP4

State is tracked in state.json so the pipeline is fully resumable — stop and restart without losing work.

Avatar V Playwright Workaround

Avatar V is not available via API. The template works around this with heygen_update.py:

generate_videos.py creates all clips using Avatar 4 via API
heygen_update.py opens the HeyGen dashboard via Playwright
For each clip, it clicks “New Revision,” switches to Avatar 5, and regenerates
redownload_videos.py fetches the upgraded versions

This workaround will be removed when HeyGen adds Avatar V to their API.

Implementation

Tool/Service: HeyGen + ElevenLabs + Remotion + Claude Code Setup:

Clone the heygen-studio-template/ project
pip install -r requirements.txt
Copy .env.example to .env, add HeyGen and ElevenLabs API keys
Edit config.json with your avatar ID and ElevenLabs voice ID
Add lesson scripts to config.json (local files or Google Docs)
Run python generate_videos.py --dry-run to test without API calls
Run python generate_videos.py --lesson 1.0 --max-parts 1 to generate first video

Cost:

HeyGen Creator plan: $29/month (unlimited standard avatar videos, 1080p, no watermark)
ElevenLabs Creator plan: $22/month (~100 minutes of audio)
Claude Code: $20 -$ 200/month (orchestrates pipeline)
HeyGen API overhead: ~$4/min of generated footage
Total monthly base: $151 -$ 251/month + API usage
For a 10-minute video via API: roughly $40 in HeyGen API credits

Cost comparison (human alternative):

Freelance video editor: $70 -$ 300 per 10-min video
Professional voiceover: $300 -$ 1,000+ per session
Studio recording: $50 -$ 200/hour for the room alone
One professionally produced YouTube video: $400 -$ 1,000+
The entire AI stack costs less per month than a single human-produced video

Integration notes:

Videos process in batches of 3 (configurable concurrent_limit) to respect HeyGen rate limits
Audio chunks capped at 65 seconds (max_audio_duration_sec) to prevent quality degradation
Voice settings tunable: stability 0.40, similarity 0.75, speed 1.03 (Nate’s optimized defaults)

Market Context

91% of businesses use video as a marketing tool (Wyzowl 2026)
67% of non-video marketers plan to start in 2026 (Wyzowl 2026)
49% of founders say AI saves them 6+ hours/week (Lenny’s Newsletter)
HeyGen hit $100M ARR in October 2025 (ARR Club, Sacra)
HeyGen: 40,000+ paying business customers as of June 2024
AI avatar market: $620 M in 2024, p ro j ec t e d$ 5.93B by 2032 (33% CAGR, MarketsandMarkets)
24% say video is too expensive (Wyzowl 2026) — this pipeline addresses that barrier

Common Objections

“That’s fake / not authentic” — The script is yours, voice is yours, face is yours. Best use cases: short-form content, course material, advertisements. Not intended for replacing personal YouTube presence.
“Won’t this flood the internet with garbage?” — AI writing tools already exist and the best content still wins. Quality filter is the idea, not the production.
“Will this kill video editor jobs?” — Changes the job, not eliminates it. Editors become AI orchestrators and quality reviewers. Same pattern as Canva with graphic designers.

Try It

Create a HeyGen avatar (15-second webcam recording gets you Avatar V)
Clone an ElevenLabs professional voice (minimum 30 min of audio samples; Nate used 2 hours)
Fork the heygen-studio-template/ project and configure your API keys
Start with --dry-run to verify script splitting, then --max-parts 1 for a single test clip
Consider building a Claude Code Skill to wrap the full pipeline into a single command

HeyGen Avatar V — The underlying model powering this pipeline; covers the technical capabilities
Remotion Motion Graphics — The editing layer that adds motion graphics to avatar clips
HeyGen Hyperframes — HeyGen’s HTML-based composition framework, complementary to Studio output
video-use (browser-use) — Complementary: HeyGen Studio Automation generates video from scripts; video-use edits existing footage. Stacking them (generate → polish) is the natural full-pipeline combination.
Claude Code Routines — The pipeline could run as a cloud routine for fully unattended production

Open Questions

When will HeyGen add Avatar V to their API, eliminating the Playwright workaround?
Can the pipeline run as a Claude Code Routine (cloud) rather than locally?
What is the quality ceiling for ElevenLabs professional voice clones with 2+ hours of training data vs instant clones?
How does this pipeline integrate with Remotion for final editing? (Nate plans a separate Remotion breakdown)

Jonathon's AI Wiki

Explorer

HeyGen Studio Automation with Claude Code

Key Takeaways

Three Shifts

Six-Stage Pipeline

Avatar V Playwright Workaround

Implementation

Market Context

Common Objections

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

HeyGen Studio Automation with Claude Code

Key Takeaways

Three Shifts

Six-Stage Pipeline

Avatar V Playwright Workaround

Implementation

Market Context

Common Objections

Try It

Related

Open Questions

Graph View

Table of Contents

Backlinks