Practical AI video production: avatar models, automation pipelines, composition frameworks, and motion graphics. Each article is compiled from primary sources (product pages, technical reports, tutorials, and open-source repos) rather than press summaries.
AI Avatars & Generation Models
- HeyGen Avatar V — Production-scale video-reference avatar model. 15s webcam clip → unlimited-duration 1080p twins with preserved identity, talking rhythm, and gestures. 175+ language lip-sync. State-of-the-art vs Kling, Veo, OmniHuman, Seedance.
- HeyGen Studio Automation with Claude Code — Three-tool production pipeline (ElevenLabs + HeyGen + Remotion) orchestrated by Claude Code. Script-to-finished-video overnight. Open-source Python template, 6-stage pipeline, Avatar V Playwright workaround, cost breakdown.
- HeyGen Studio CLAUDE.md Template (Bootstrap Pattern) — HeyGen-shipped
CLAUDE.mdtemplate for bootstrapping a HeyGen Studio project under Claude Code. Captures the recommended state-management contract, the dual-script-source (storyboard + final), the “things you can ask” enumeration as the load-bearing CLAUDE.md pattern, and the 6-stage pipeline shape. Companion artifact to HeyGen Studio Automation — that article is the how; this is the what you ship when handing the project to Claude Code. - skills (Official HeyGen Skills Bundle) — Vendor-published 3-skill bundle (MIT, 232 stars, v3.1.0 2026-04-27, Shell):
heygen-avatar(persistent digital twins from photos + voice synthesis),heygen-video(idea-to-scripted-video with avatar delivery + style recommendations),heygen-translate(175+ language localization with voice cloning + lip sync). Eleven runtimes supported (Claude Code / Cursor / Codex / OpenClaw / Gemini CLI / Copilot / Junie / Goose / OpenHands / Amp / Cline). Three install paths (gh skill install heygen-com/skills <name>recommended; ClawHub one-shot; OpenClaw plugin). Dual-auth: API key OR MCP/OAuth against HeyGen plan credits. Sister pattern to skills — same April-May 2026 window, same vendor-published-skills shape, both wrap existing API/MCP/CLI in a Markdown-skill format the agent loads at session start. Pairs with Hyperframes for full-pipeline coverage.
Search & Discovery
- SentrySearch — Semantic Search Over Videos (Gemini + Qwen3-VL) — Local-first CLI that indexes video footage (default 30s chunks with 5s overlap, 480p / 5 FPS) and searches by natural language using Gemini Embedding 2 (cloud) or Qwen3-VL (local or DashScope). Sister tools SentryMerge (cam-config + reruns) and SentryBlur (face / license-plate / NL redaction) compose into a search → trim → redact pipeline via shared
~/.sentrysearch/last_clip.jsoncache and--lastflag. Tesla Sentry / dashcam by default but cam-agnostic via SentryMerge’s modular cam-config. Python 3.11/3.12 pin (PyTorch wheel limitation). Operator-tunable confidence threshold (default 0.41). Occupies the search slot in the topic — distinct from existing generation, composition, assembly, and editing tools. - watch) — Give Claude the Ability to Watch Any Video — MIT Claude skill (taoufik123-collab, 58★) that lets Claude watch a video: scene-change frame extraction (one frame per cut — token cost bounded by shots, not duration), a 0–10s “hook microscope” (2fps + word-level Whisper on the opening, for studying why a viral hook works), a fixed-schema
report.md, and optional Obsidian auto-save that mirrors this wiki’s own clip→compile pattern. yt-dlp + ffmpeg + free captions (Whisper only as fallback). The input/understanding counterpart to video-use (editing) and the Video Toolkit (production).
Editing & Assembly
- How Fable 5 Edited Its Own Launch Video (Thariq, Claude Code Team) — First-party Anthropic case study: the Fable 5 launch video edited by Fable itself, no video editor opened. The edit is a repo — Whisper word-level transcripts →
final-edit.jsonEDL with a written rationale per pick → frame-accurate ffmpeg cuts (self-verified by re-transcribing the output: “zero ums”) → 7 hand-written S-Log3→Rec.709 LUTs → 11 designer PNGs rebuilt as Remotion components (overlays land on spoken beats from the transcript) → 2× code→Figma→code round trips via Figma MCP → headless 4K render reviewed still-by-still. 17 takes / 25 GB raw → 3:00 / 653 MB finished 4K in 4 days, driven with/goal dont stop until you have a final video. Caveat from the thread: a professional colorist flagged improper S-Log3 color management — domain review remains the backstop. - video-use (browser-use) — Claude Code skill for conversational video editing. Transcript-driven cuts (ElevenLabs Scribe word-level + diarization), parallel animation sub-agents (PIL / Manim / Remotion), self-evaluating render loop. 12 hard rules for production correctness, artistic freedom elsewhere. Python + ffmpeg, “100% open source.” 3,151 stars.
- OpenCut — Open-Source CapCut Alternative — Free MIT-licensed video editor for web/desktop/mobile (51.6k★, 5.6k forks, 96 contributors, TypeScript + Rust + WGSL). Classic version ships today at opencut.app; rewrite (announced 2026-05-18) targets agent-native editing: Editor API + plugin-first architecture + Rust core for one cross-platform codebase + MCP server for AI agents + headless mode for batch rendering + in-editor scripting tab. Lead maintainer
mazeincoding; primary sponsor fal.ai. Rust + WASM is the architectural commitment (wgpu/WASM GPU renderer, WASM compositor, Rust-WASM time utilities). Occupies the agent-callable-NLE slot the OSS stack didn’t have — different paradigm from Remotion (programmatic) and Hyperframes (HTML composition). - OpenMediaTools — Browser-Native FFmpeg-WASM Suite — Free privacy-first browser-based converter/extractor for video (MP4/MOV/MKV/WebM/AVI), audio (MP3/WAV/OGG/FLAC/AAC/M4A), image (JPG/PNG/WebP/GIF), PDF, and AI image gen — entirely in-browser via WebAssembly + FFmpeg, no upload, no server, no account, no file-size limits beyond device RAM. Utility-tier upstream prep for the AI-video pipeline; the social-media downloader/extractor category is the distinctive offering for competitive-intel workflows. AI tools’ backend (in-browser via WebGPU/ONNX vs server-routed) is the load-bearing open question — privacy guarantee is only true if all six tool categories actually stay client-side.
Composition & Motion Graphics
- HeyGen Hyperframes — Open-source HTML-based video composition framework (Apache 2.0). Ships
/hyperframes,/hyperframes-cli,/gsapskills for Claude Code, Cursor, Codex, Gemini CLI. Deterministic rendering, Frame Adapter pattern, 50+ prebuilt blocks. - Remotion Motion Graphics — AI motion graphics generator converting natural language prompts into React-based Remotion animations. Constants-first code generation, in-browser Babel compilation, live preview.
- Claude Code Video Toolkit (Digital Samba) — Open-source AI-native video production workspace for Claude Code (MIT, 890 stars). 10 skills, 13 slash commands, templates, brand profiles, transitions library. OSS model stack (Qwen3-TTS, FLUX.2, LTX-2, ACE-Step, SadTalker) on your Modal/RunPod account. Typical cost $1–2/month.
Higgsfield (API-first generative video)
- Higgsfield Overview — API platform for generative AI (images + videos). Async queue: submit → poll-or-webhook → fetch. Credit-based billing. API key + secret auth. Base URL
platform.higgsfield.ai. - skills (Official Skills Bundle) — Vendor-published Markdown SKILL.md bundle for Claude Code, Cursor, Codex (MIT, 102★, v0.3.0). Four skills:
higgsfield-generate(30+ models + Marketing Studio),higgsfield-soul-id(face training → reusablereference_id),higgsfield-product-photoshoot(10 modes with backend prompt enhancement ongpt_image_2),higgsfield-marketplace-cards(Amazon-style listing assets — main + secondaries + A+ modules with hidden marketplace-compliant templates). Three install paths (npx skills add/gh skill install/ Claude Code/plugin marketplace add); CLI install + auth handled automatically. The productized version of the Nate Herk CLI-over-MCP thesis; supersedes Higgsfield MCP as the recommended primary agent surface. COOKBOOK ships three end-to-end recipes (founder-photo brand campaign, URL-to-4-ad-modes UGC batch, recurring founder team-update video). - Higgsfield Supercomputer (hosted agentic chat surface) — Higgsfield’s cloud-native agentic platform for creative AI, launched 2026-05-14 at higgsfield.ai/supercomputer. Built on a Hermes-agent fork (first commercial Hermes deployment we’ve tracked outside Nous Research). Frontier-model brain selection at runtime (GPT 5.5 Pro / Claude Sonnet / Claude Opus 4.6 / Gemini 3.1 Pro). Higgsfield internal skills (product images, ad creative pack, UGC workflow, Soul ID character model) preloaded so a one-line operator prompt expands into a multi-step plan with reference loading, prompt enhancement, generation checkpoints, gallery review. Cross-model image-to-video chain (Kling 3.0, Seedance 2.0). Checkpoint card surfaces credits before generation — prevents typo → 10× credit drain. Productizes the Higgsfield + Claude Code creative agency thesis inside a single hosted chat surface; trades configurability for time-to-first-ad.
- Higgsfield MCP — Conversational surface. Custom connector at
https://mcp.higgsfield.ai/mcpdrops the full image + video stack inside Claude / OpenClaw / Hermes / NemoClaw. No API keys; sign in with your Higgsfield account. 16+ image models, 17+ video models, 9 video presets, Soul Characters, multi-model side-by-side, Product Ads from a URL. - Higgsfield Virality Predictor — Score a short clip before posting: virality index, hook score, hold rate, peak hook timestamp, and a brain/attention heat map, reached through the MCP or the Higgsfield site. Launched May 2026, experimental preview, free (no credits consumed). Turns Higgsfield from generate into generate-then-evaluate; the operative move is filtering ad creative before distribution (generate 10, score, run the 2 strong, cut the 8 before they touch a budget). Both creator sources stress it is a relative filter, not an oracle (no grasp of meme/cultural context; real performance still depends on targeting/offer/distribution). Addresses the community “is this Meta’s TRIBE 2?” question: conceptually similar, not confirmed the same model. Two June-2026 explainers (Nick Pontis / Mind Marketing + AI Fire); closes the publish-loop gate the prior Higgsfield-MCP tutorials left open.
- Higgsfield + Claude Code Ad Agency Workflow — End-to-end DTC campaign in one Claude Code conversation. Firecrawl MCP scrapes brand brief → GPT Image 2.0 hero static (4 variations, Claude grades) → copy overlay → Seedance 2.0 hero animation → UGC creator generation → 2 UGC video clips. Claude as orchestration layer + creative director. ~170 credits per campaign before iteration. Mike Futia / SCALE AI tutorial.
- Higgsfield MCP Tutorial — Brand Book, Storyboard, Landing Page (Robo Nuggets) — Sister tutorial to the Mike Futia ad-agency workflow but optimized for a brand-launch use case. End-to-end Spiderhead AI demo: logo → side-by-side brand books from Nano Banana 2 vs GPT Image 2 → 6-panel logo-animation storyboard → Seedance 2.0 video → mockup landing page → Claude Code (VS Code fork-conversation) translating mockup to a working localhost landing page. Two-path setup (~30s, desktop-app or
/mcpfor Antigravity-class IDEs). Operator gotcha: per-call credit cost not yet returned by MCP. Cost-aware confirmation pattern before high-credit ops (e.g., 720p Seedance). Higgsfield-vs-fal.ai decision call: existing subscribers should use the MCP; new users should evaluate fal.ai for pay-as-you-go on the same models. - Higgsfield MCP — 50-Ad Instagram Campaign from One Product Image (Claude Desktop) — Third Higgsfield-MCP tutorial, narrower than the other two: pure ad-campaign-at-scale use case. Connects MCP via Claude Desktop’s Connectors UI; runs a four-phase chain (research with Playwright MCP scraping Meta Ads Library → 5×5×2 ad matrix with human-approval gate → batched generation across Nano Banana 2 (product) + Soul 2 (humans) → local download organized by batch/aspect-ratio). Ends by saving the entire workflow as a
/ad-creatorskill via/skill-creator, turning the multi-prompt build into a one-command tool. Operator gotchas captured: Claude misreporting Playwright availability (verify, don’t trust); generations land in Higgsfield Community tab → My Generations not the main image gallery; pre-flightlist_workspaces+ balance check before 50-image runs (~949 credits used). Settings: Opus 4.7 extra-high, bypass-permissions, project folder mounted. Pairs with Meta Ads CLI for end-to-end generate-and-upload. - Higgsfield MCP + Claude — Content Factory Skill Walkthrough (Adil) — Walkthrough of a custom Claude Skill that wraps Higgsfield MCP in a four-stage marketing-content workflow: research → content plan → generate → meta-ads upload. The skill is the load-bearing primitive — turns Higgsfield from operator-driven generation into agent-driven brand-content factory runnable from a single Claude chat. Stage 3 uses GPT Image 2 + Higgsfield Marketing Studio with per-batch permission gates (operator approves as it goes, prevents credit-runaway). Stage 4’s Meta Ads upload closes the publish loop the prior four Higgsfield-MCP tutorials in this topic (Mike Futia / Robo Nuggets / 50-Ad Campaign / Nate Herk Creative Agency) left open. Demo: 100-video UGC batch. The fifth Higgsfield+Claude tutorial in the topic — distinctive on closed-loop publish + four-stage skill packaging.
Multi-Model Claude Skills (filmmaking)
- FREE Seedance 2.0 Claude Skill — Multi-Model AI Filmmaking Workflow (LTX Studio) — Single free Claude Skill that prompts Nano Banana Pro, GPT Image 2, and Seedance 2 in their own natural prompt languages for three distinct AI-filmmaking jobs: character sheet, 3×3 cinematic storyboard grid, and Seedance 2 shot (continuous + timestamp-with-dialogue paths). Creative engine: LTX Studio (not Higgsfield) — pure prompt-engineering wrapper, no MCP. Workflow: image inspiration → character sheet → storyboard grid (with character refs) → Seedance 2 prompt referencing the grid as image-one-as-cinematic-grid → 15-second video. Demonstrated two-character dialogue scene end-to-end. Sister to Adil’s Content Factory (marketing-UGC use case) — same “Claude Skill as load-bearing primitive” pattern, distinct use case (short film vs. ad) + distinct creative engine (LTX vs. Higgsfield). Free zip download from video description. Creator unattributed in transcript head (YT ID
Fark3A0ACzM) — flagged in article Open Questions. - AI Animated Short Film Pipeline — Seedance 2.0 + Codex + ElevenLabs (MattVidPro) — One-person ~3-min animated comedy short built almost entirely in Seedance 2.0. Codex runs pre-production (story + all generation prompts + GPT Image 2 art + an HTML reference site); ~30-50 image-to-video clips on Polo AI; ElevenLabs re-voices every line (native Seedance dialogue clones famous voice actors); royalty-free ambiance + manual SFX in post. Includes a platform cost table (Polo AI / Open Art / Runway / Higgsfield Ultra / fal.ai), the signature duplicate-character failure mode (fix: “single character” prompting), and a Gemini-Omni-vs-Seedance comparison (Omni edits real video, fails 2D-animation consistency). Sister to the free Seedance Claude Skill (LTX route) — this is the Polo-AI + post-production-fix counterpart.
- Higgsfield as a Creative Agency in Claude (Nate Herk) — Fourth Higgsfield+Claude tutorial, three new dimensions vs the prior trio: (1) CLI-over-MCP architectural call for agentic work on token-cost grounds (“the MCP has all those tools, so from a token perspective it’s actually more expensive — the CLI is just better for agents”); (2) skill reverse-engineering workflow turning a single winning prompt into a reusable
.claude/skills/hypermotion-video/SKILL.mdrecipe that compounds across runs; (3) two-routine scaling pattern (Sunday-plan + Monday-generate) that grows asset bank from 50 → 100 → 200 ads per week while operator sleeps, with a Google Workspace CLI–created Sheet acting as the cross-routine asset database. Demoed on fictional headphone brand “Murmur” + a sleep-supplement bottle with Higgsfield Marketing Studio’s Hypermotion preset. Operator gotchas: skill written mid-session needs Claude restart to register; reference-image fidelity needs explicit “must appear exactly as shown” wording; rejected generations diagnose-and-retry via Claude reading its own prompt. Companion to the same author’s ElevenLabs voice-agents tutorial (same direct-vendor-CLI architecture for ElevenLabs). - Higgsfield Image-to-Video — Three featured models (
higgsfield-ai/dop/preview, Bytedance Seedance Pro, Kling v2.1 Pro). Motion-prompt template: describe movement + set pace + specify camera moves. Technical checklist. - Higgsfield SDK (Python) —
pip install higgsfield-client. Auth viaHF_KEYorHF_API_KEY+HF_API_SECRETenv vars. Four usage patterns (submit-and-wait, polling, callback, request management). File uploads for bytes/paths/PIL. JS/TS coming soon. - Higgsfield Webhooks — Add
hf_webhookquery param to submit URL. Delivers completed/failed/NSFW final states. 2-hour retry window. HTTPS + 2xx in 10s required. Idempotency viarequest_id. - Higgsfield Training Framework (OSS Origin) — Historical context. Apache-2.0 distributed-training framework at
higgsfield-ai/higgsfield(3.6k stars, last push 2024-05-25). Pre-pivot artifact — the company moved from trillion-param LLM training infra to consumer creative AI. Listed for vendor-lineage awareness, not for use.
HeyGen tutorials (2026-05-17 cohort)
- HeyGen Instant Highlights V2 — Auto-Clip Long-Form Video to Short-Form Clips — Drop a long-form video (up to 10 GB) or URL, the tool analyzes for highlight moments (speech energy, importance signals, viewer-save likelihood), and auto-cuts to short-form clips with optional captions in 9:16 / 16:9 / 1:1 formats. Clip-length range control (under 30s / 30-60s / 1-3 min / 3+ min). Optional steering instructions for what to focus on or avoid. The product replaces hours-long manual timeline scrubbing — same workflow shape as Opus.pro / Vizard / Klap / Munch, surfaced inside HeyGen.
- Style) — Prompt-engineering framework for HeyGen’s Avatar Shots feature (Avatar 5 + Seedance 2.0). 5-element prompt structure (subject / action / environment / camera / style) with one camera move per shot, layered into multi-shot beats (timestamps + per-beat camera moves), multi-avatar choreography (up to 3 avatars/scene with explicit blocking + relative-motion), and elements references (pre-uploaded locations / outfits / products for continuity). Avatar 5 for A-roll + Avatar Shots for storytelling beats. Working prompt example included.