Source: ai-research/hyperframes-launch-videos-tutorial-peter-yang-2026-06-21.md (youtube.com/watch?v=iqb5Rd6KKr8 — Peter Yang interviews Bin Liu, VP of Product Engineering at HeyGen, and Jake Moran, PMM on the HyperFrames team; 36 min, published 2026-06-21)

A primary-source walkthrough from the people who build HyperFrames: how to make professional launch videos for free, plus the exact playbook HeyGen uses internally to ship a polished launch video the same day it launches. Where the HeyGen Hyperframes hub and the docs cluster cover the framework mechanics, this interview adds the operator workflow, the model guidance, and the design thesis behind it — most importantly the spatial-vs-temporal-aesthetics problem that makes launch videos hard for agents and the open-sourced launch-video repo HeyGen uses as a component library.

Key Takeaways

  • The whole thesis: “Everyone now has a coding agent, so everyone can ask their agent to make a HyperFrames video.” No After Effects, no CapCut, no tool to learn. Bin: “I think a lot of the engineers here actually can make their own launch videos now.” The video itself is end-to-end code (HTML/CSS/JS) rendered to MP4; assets (e.g. a talking-head clip) sit on top.
  • Launch video is “the holy grail.” Bin (a prior founder) spent $30,000 on a single launch video — “and I was told that was cheap.” That price gap is the wedge: the HyperFrames team shipped 20–25 launches in ~2 months, each with a good launch video, with Jake (a PMM, not a video editor) making almost all of them.
  • Three ways to start (in rising order of hand-holding): (1) the quickstart command that pulls the HyperFrames skill into your coding agent; (2) the Codex plugin store → Creativity → “HyperFrames by HeyGen”^[inferred — caption says “Haijan”]; (3) the official Claude connector — search “HyperFrames” in claude.ai connectors, then say “make me a video” in any chat.
  • design.md vs frame.md — the key aesthetic-source distinction. design.md is a brand guideline (fonts, colors, hex codes) authored for webpages; frame.md (released ~mid-June 2026, generated at hyperframes.dev/design) reformats it for video — maximize the frame, larger elements, motion. You “drop your design.md and the agent reformats it.” This is the single most-load-bearing input for matching a brand.
  • The open-sourced launch-video repo is a component library. HeyGen open-sources every launch video — the entire code base that renders it — and there are “at least 50 components” they reuse regularly. The move: “I love the text animation from [named video] — grab that one for my intro.” Jake reused the same Claude Code prompt-box component across three different videos with different frame.mds; it’s code, so you take the structure and restyle fast.
  • storyboard.html is the speed unlock. Before committing to a full composition, ask the agent to render one static frame per scene (the most visually dense moment) using your references + design system. You align on aesthetic in seconds instead of waiting for a full 45s+ render. Then: “turn this into a full HyperFrames video.”
  • Studio + Inspector = last-mile editing where UI edits become code. Humans tweak text/position in the HyperFrames Studio without touching HTML; the change is written back as code, so the agent sees the diff and keeps collaborating. “Because LLMs are so good at HTML/CSS/JS, the agent knows exactly, visually, what that change entails.”
  • Why code, not JSON (the origin story). HeyGen first tried a video agent over the JSON/XML data models that video editors sit on — but agents have no visual intelligence there: a JSON blob can be structurally correct yet the agent has “no idea whether it’s going to look good,” and human UI edits left the agent asking “what changed?” The pivot: HTML is the LLM’s native language — only really true after Gemini 3 / GPT-5 / Opus-class models. Code becomes the foundation layer; footage/images/SVGs sit on top.
  • Spatial vs temporal aesthetics — the hard part. LLMs are good at spatial aesthetics (a landing page your eye scans top-to-bottom) but not temporal aesthetics (a video where your eye stays put and information is fed over time) — “it’s not being trained on top of that.” HeyGen builds internal evals, benchmarks, and self-check loops, open-sources those ideas into the skills, and is working with frontier model labs to train LLMs to be better at it. Corollary: a “PPT video” doesn’t cut it for a launch — “people won’t watch your PPT for more than 5 seconds.”
  • Model guidance, from their own evals. Top tier: GPT-5.5 and Fable 5. Best quality-to-cost balance: Gemini — HeyGen’s internal agent is built entirely on Gemini. Frontier models are also strong at visual understanding (Fable 5 can clip a specific timestamp range out of a video to highlight).
  • Free by design. Audio uses a local TTS model the agent downloads on demand (the hub’s Kokoro); HeyGen / ElevenLabs are optional premium providers. It’s HTML, so you can also export it as a website / interactive video — the player can be interactive.

Setup — Three Ways In

  1. Quickstart command (recommended). Copy the command from the HyperFrames quickstart page and run it in your terminal; it pulls in the HyperFrames skill so your coding agent (Claude Code, Codex) knows how to use it. See Quickstart & CLI for the canonical command.
  2. Codex plugin store. Codex → plugin store → Creativity section → “HyperFrames by HeyGen”^[inferred — caption “Haijan”] → install → try it in chat.
  3. Claude connector (no CLI). In claude.ai → customize / connect your apps → search “HyperFrames” → install → in any chat, “make me a video about X.” This is the official @claudeai connector (MCP-in-Claude) the hub documents.

The One-Shot: Website → Launch Video

The clearest demo: Claude Code with Fable 5, given spotify.com and the prompt “make a launch video for spotify.com,” produced a full 1-minute launch video — Fable followed the website-to-video skill: pulled assets/screenshots from the site, wrote a full storyboard, broke it into scene 1 / scene 2 / scene 3, and authored the whole code base, audio included. Bin’s advice: even if you don’t use this one skill, reading the skill is the best way to learn how to drive HyperFrames — it spells out the 7 steps (capture → design → script → storyboard → VO → build → validate). See Website to Video for the full pipeline. Usage is as simple as /website-to-video + a URL.

Jake’s Launch-Video Playbook (net-new, no website to start from)

The harder, higher-craft case: HeyGen’s own launches usually have no website yet — only Figma screenshots, a README about a new feature, or a Claude Code session. So Jake does the groundwork the website-to-video capture step would otherwise do:

  1. New project folder = context + assets. Drop in the context you have (e.g. a feature README) and, crucially, assets: UI screenshots, plus examples from elsewhere he likes — “I see a couple frames of this video already in my head; here they are.”
  2. Add one aesthetic source — a frame.md. Either hand-write a design.md brand guideline or generate a frame.md at hyperframes.dev/design (it reformats a design.md to be video-optimized). This is “a really key thing for matching the aesthetic of your brand.” See HyperFrames in Claude Design for where design.md comes from.
  3. First prompt → a table of key events. Point the agent at the project folder + the frame.md, and ask it to produce a scene-by-scene table (what’s said, brief note on what’s on screen). Refine the text copy here — “I really care about what we’re going to say and how it builds.” This pass is the story, not the visuals.
  4. Reuse components from the launch-video repo. Pull specific elements from prior launch videos: “I love the text animation from [video name] — grab it for my intro.” Because it’s code, you inherit a working baseline and just restyle (colors, pacing) — “more likely to work the first try.” Jake reused the same Claude prompt-box across three videos via different frame.mds.
  5. storyboard.html — align on look before the full build. Ask the agent to render one static frame per scene (most visually dense moment) using the references + design system. Review/iterate on static frames (fast) instead of waiting for the full composition (slow). Most of these videos are made within a day of launch, so this is the primary time-saver.
  6. “Turn this into a full HyperFrames video” → open in HyperFrames Studio / npx hyperframes preview.
  7. Last-mile edit in Studio. Studio splits the video into scenes; the Inspector lets a human move/retype text via UI without touching code — and the edit becomes code, so the agent can diff it and keep collaborating on the parts that are hard to describe.
  8. Export. MP4 / MOV / WebM, including a transparent-background WebM so pro editors can drop HyperFrames motion graphics into Premiere. (And, being HTML, it can export as a website / interactive video.)

Why Code, Not JSON — and the Aesthetics Thesis

  • HeyGen’s lane is communication, not cinema. They don’t compete with cinematic generative video (Seedance, Veo 3, “Hollywood”)^[inferred — caption “C dance / Bale 3”]; the bet is that video is the best communication format — “would you rather read a five-page doc or watch a 1–2 minute video?” Avatars solved A-roll (people shy on camera); HyperFrames is the B-roll / motion-graphics / explanation layer that takes users end-to-end without a CapCut/Premiere handoff or a hired editor.
  • The JSON dead-end. Agents over JSON/XML video models are structurally accurate but visually blind — and human UI edits desync the agent. Code (HTML) fixes both: LLMs express information and visual aesthetics in HTML/CSS/JS, and a UI edit that becomes code is legible to the agent. This only became reliable with Gemini 3 / GPT-5 / Opus-class models.
  • Spatial vs temporal aesthetics is the open research edge. Models trained on web layout are good at spatial composition; video timing (temporal aesthetics) is under-trained. HeyGen’s answer is evals + benchmarks + self-check loops, the ideas open-sourced into skills, plus active collaboration with frontier labs to train for it. The practical tell: PPT-style slide-videos are fine for internal use but fail as launch videos.

Use Cases the Builders Named

  • Product launch videos — the flagship (“holy grail”).
  • PR-to-video / commit-to-video — internally they ask Claude Code to “look at my commits for the last 7 days and tell my team what I did,” rendered as a video; a Friday-afternoon ritual of watching ~10 such videos together.
  • Agent-output-to-video — agents are verbose (“walls and walls of text I just don’t read”); the fix is “when you’re done, make a HyperFrames video telling me what you did in 30 seconds.” Deep Hermes Agent integration supports this (HyperFrames skill native to Hermes; one command to add).
  • Also: real estate, educational, internal training, standalone motion graphics, family reels (give it photos / play an MP4 clip inside the HyperFrames video), and slide decks (slides support is on the wishlist).

Roadmap (builder-stated)

  • Open-source the storyboarding (storyboard.html) workflow for everyone.
  • “Media use” skill set inside HyperFrames — teach it background matting, sound effects, music; “a lot of that HeyGen will offer for free,” some premium-model features will cost. (Conceptually adjacent to the wiki’s video-use direction.)
  • Templates — a gallery you can fine-tune (palette, typography, fonts), then “download the design pack” to get a frame.md that works well with HyperFrames.
  • Slides / PPT support — “we were literally talking about it this morning.”

Implementation

  • Tool/Service: HeyGen HyperFrames (open source) — see hub for repo/license/CLI. This article is the builder-interview operator layer.
  • Setup: any of the three entry paths above. For the playbook: a per-launch project folder holding context + assets + one frame.md; drive with a coding agent (Fable 5 / GPT-5.5 for top quality, Gemini for cost).
  • Cost: free to start — local TTS model, local render; premium TTS/image (HeyGen, ElevenLabs) and hosted render are optional. Model API cost depends on the agent you drive it with.
  • Integration notes:
    • frame.md from design.md at hyperframes.dev/design is the highest-leverage brand step.
    • The open-sourced launch-video repo is a reuse library — point the agent at it and pull named components/animations.
    • storyboard.html (one frame per scene) is the cheapest way to lock aesthetic before a full render.
    • Studio/Inspector edits round-trip to code — the human-in-the-loop step that keeps the agent in sync.
    • Export transparent WebM when feeding a traditional NLE; export as a website for interactive output.

Try It

  1. Run the one-shot. Install the skill, then in Claude Code (Fable 5) point at any site: “Make a launch video for [url].” Watch it capture → storyboard → build. (See Website to Video.)
  2. Make a frame.md first. For a net-new launch with no site, generate a frame.md at hyperframes.dev/design from your brand’s design.md, drop it in a project folder with UI screenshots, and ask for a table of key events before any building.
  3. Storyboard before you render. Ask for storyboard.html (one frame per scene) and iterate on static frames until the aesthetic is right — only then “turn this into a full HyperFrames video.”
  4. Mine the launch-video repo. Point your agent at HeyGen’s open-sourced launch videos and pull a named animation/component into your own intro.
  5. Try commit-to-video. Ask Claude Code to summarize your last 7 days of commits as a 30-second HyperFrames video for your team.

Open Questions

  • Exact quickstart URL / repo paths as spoken are caption-garbled (“hyperframes.ai.com/quickstart”, “HyperFrames by Haijan”); the canonical command lives in Quickstart & CLI / the HeyGen repo. Confirmed: frame.md tooling at hyperframes.dev/design.
  • The open-sourced launch-video repo’s location — Bin says “it’s all there” in the HyperFrames open-source project; the precise repo/sub-path for the per-launch source isn’t stated on-camera.
  • Temporal-aesthetics eval details — HeyGen’s internal benchmarks/self-check loops and the frontier-lab training collaboration are described but not published; numbers not given.
  • storyboard.html open-sourcing timing — stated as “likely will open source,” not yet shipped at time of recording.