Source: The Pipeline, Introduction, Launch Videos — HeyGen HyperFrames docs
HyperFrames is an open-source framework that turns HTML into deterministic, frame-by-frame rendered video — “define a video the same way you build a web page.” The canonical production workflow is a seven-step pipeline (capture → design → script → storyboard → VO+timing → build → validate) where each step emits a named on-disk artifact that feeds the next and acts as a re-entry checkpoint. Getting started maps onto the same primitives at a smaller scale — write HTML, preview in the browser, render to MP4 — and the pipeline is the recommended path once a video has three or more beats or needs to be inspectable by a non-author.
Key Takeaways
- Seven sequential steps, one artifact each: Capture →
capture/; Design →DESIGN.md; Script →SCRIPT.md; Storyboard →STORYBOARD.md; VO+Timing →narration.wav+narration.txt+transcript.json; Build →compositions/*.html(one HTML file per beat); Validate → snapshot PNGs + passinglint/validate. - Each step has a “gate” — an explicit done-condition before advancing (e.g. Step 1’s gate is being able to name the source’s top colors, fonts, and standout assets; Step 7’s is
lintandvalidatepassing with zero errors and a shareable Studio preview URL). - Durations come from narration, not estimates —
SCRIPT.mdmust be finished before storyboarding, and storyboard beat timings (0.0s - 5.8s) are taken from the word-leveltranscript.jsonproduced in Step 5. - Determinism is the design principle: rendering is seek-driven with no wall-clock dependencies —
frame = floor(time * fps), each frame captured via Chrome’sbeginFrameAPI and encoded with FFmpeg, so identical input yields identical output (reliable CI + batch rendering). - Getting-started essentials: compositions are plain HTML with
data-start/data-duration(timing) anddata-track-index(layout); the three-command loop is write HTML →npx hyperframes preview→npx hyperframes render --output output.mp4. No build step, no DSL, no React requirement. - Built for agents: HTML is the format LLMs generate best, and the CLI is non-interactive by default (flag-driven, plain-text output, fail-fast) so an agent can drive every command; add
--human-friendlyfor the interactive terminal UI. - Launch-video framing: HeyGen open-sources its real product-launch compositions in the
hyperframes-launchesrepo — production-grade (not toy) projects that show the pipeline at full scale: multi-composition projects of 4-8 sub-compositions wired into one root timeline, multiple adapters (GSAP, Lottie, shaders, Three.js, CSS) in a single render, all via standardhyperframes render. - Re-entry without re-running everything: edit
STORYBOARD.mdand rebuild a single beat, open one composition file and tweak it live underpreview, or swap the voice by re-running TTS againstnarration.txt(which already holds pronunciation substitutions) without redoing the script.
The seven pipeline stages
Each stage produces a named artifact and has a gate that must pass before moving on.
| # | Step | Output | What happens |
|---|---|---|---|
| 1 | Capture | capture/ | Extract screenshots, design tokens, fonts, assets, animations from a source |
| 2 | Design | DESIGN.md | Brand reference: colors, typography, components, do’s and don’ts |
| 3 | Script | SCRIPT.md | Narration text with hook, story, proof, and CTA |
| 4 | Storyboard | STORYBOARD.md | Per-beat creative direction: mood, assets, animations, transitions |
| 5 | VO + Timing | narration.wav + transcript.json | TTS audio with word-level timestamps |
| 6 | Build | compositions/*.html | Animated HTML compositions, one per beat |
| 7 | Validate | Snapshot PNGs + lint/validate pass | Visual verification and runtime checks before delivery |
Step 1 — Capture (capture/)
- Extracts screenshots at every scroll depth, pixel-sampled color palettes, the CSS font stack with downloaded woff2 files, semantically-named images and SVGs, Lottie animations, and detected page animations.
- Optional Gemini vision enrichment adds AI descriptions of every captured asset.
- For non-website sources (PDFs, decks, CSVs, notes), gather assets into
capture/manually for later reference. - Gate: you can describe the source’s visual identity in one or two sentences and name its top colors, fonts, and standout assets.
npx hyperframes capture https://example.com -o my-video/captureStep 2 — Design (DESIGN.md)
- Encodes visual identity factually so downstream steps reference exact colors, fonts, and components. Six sections: Overview, Colors (5-10 HEX with semantic roles), Typography, Components, Imagery, Do’s and Don’ts.
- Gate:
DESIGN.mdexists with all six sections filled from real captured data (or deliberately chosen for net-new projects).
Step 3 — Script (SCRIPT.md)
- The narration foundation; scene durations derive from narration, so finish the script before storyboarding. Typical structure: hook → story → proof → call-to-action. Reference real features/stats from
capture/extracted/visible-text.txt. - For non-narrated videos,
SCRIPT.mdbecomes a per-beat copy plan with on-screen text and timing notes. - Gate:
SCRIPT.mdexists in the project root.
Step 4 — Storyboard (STORYBOARD.md)
- Specifies what to build per beat: timing, the exact narration line, mood & camera, assets (by path), techniques (2-3 from the techniques library — SVG path drawing, Canvas 2D, CSS 3D, per-word typography, Lottie, video compositing, typing effects, variable fonts, MotionPath, velocity transitions, audio-reactive), transitions in/out, and SFX.
- Timing fields (e.g.
0.0s - 5.8s) come fromtranscript.jsononce Step 5 runs. - Gate:
STORYBOARD.mdexists with beat-by-beat direction and an asset audit naming every file used.
Step 5 — VO and timing (narration.wav, narration.txt, transcript.json)
narration.wavships with the final render;narration.txtis the exact spoken text with pronunciation substitutions applied (API→A P I,$2T→two trillion) and is kept distinct fromSCRIPT.mdso the voice can be regenerated later without redoing substitutions;transcript.jsonholds[{ text, start, end }]per word and every later step reads it for timing.- Ships multiple TTS adapters: Kokoro, ElevenLabs, HeyGen. After generating audio, update
STORYBOARD.mdwith real beat boundaries fromtranscript.json. - Gate: all three files exist and
STORYBOARD.mdtimings reference real timestamps, not estimates.
npx hyperframes tts SCRIPT.md --voice af_nova --output narration.wav
npx hyperframes transcribe narration.wavStep 6 — Build (compositions/<beat-name>.html)
- The storyboard becomes runnable HTML — one file per beat — importing captured assets by path, using exact colors/fonts from
DESIGN.md, animating with storyboard-specified techniques. - For multi-beat videos, spawn a focused sub-agent per beat with fresh context, that beat’s storyboard section, the needed asset paths, and relevant technique references; self-review each composition after building.
- Gate: every composition is self-reviewed — no overlapping elements, no misplaced assets, no static images sitting unanimated.
Step 7 — Validate (snapshot PNGs + lint/validate)
lintruns static HTML structure checks (missing attributes, timeline registration issues, tween conflicts, CSS-transform vs. GSAP conflicts);validateloads each composition in headless Chrome to surface runtime JS errors, missing assets, and failed network requests;snapshotcaptures frames at specific timestamps for visual inspection.- Gate:
lintandvalidatepass with zero errors, snapshot frames look right, and the Studio preview URL is ready to share.
npx hyperframes lint # static HTML structure checks
npx hyperframes validate # loads in headless Chrome, catches runtime errors
npx hyperframes snapshot my-video --at 2.9,10.4 # PNGs at beat midpoints
npx hyperframes render --output my-video.mp4- For personalized or catalog outputs, render with
--batch rows.json --output "renders/{name}.mp4"and use the generatedmanifest.jsonas the delivery checklist.
Project layout
The pipeline writes a predictable on-disk tree; capture/ only appears when capturing a source.
my-video/
├── capture/ # Step 1, only present when capturing a source
│ ├── screenshots/ # scroll-000.png, scroll-001.png, …
│ ├── assets/ # downloaded images, SVGs, fonts
│ ├── extracted/ # tokens.json, visible-text.txt, asset-descriptions.md
│ ├── AGENTS.md # capture summary for AI agents
│ └── CLAUDE.md
├── DESIGN.md # Step 2, brand cheat sheet
├── SCRIPT.md # Step 3, narration backbone
├── STORYBOARD.md # Step 4, beat-by-beat creative plan
├── narration.wav # Step 5, TTS audio
├── narration.txt # Step 5, exact spoken text (with pronunciation subs)
├── transcript.json # Step 5, word-level timestamps
├── compositions/ # Step 6, one HTML file per beat
│ ├── beat-1-hook.html
│ ├── beat-2-story.html
│ └── …
├── snapshots/ # Step 7, visual verification PNGs
├── renders/ # optional final MP4 outputs
└── index.html # root project file wiring compositions into a timeline
Getting started (the minimal loop)
For small one-shot work you do not need the full pipeline — the same primitives scale down.
- Write HTML. A composition is an HTML document; each element carries
data-startanddata-durationfor timing anddata-track-indexfor layout. Animate with GSAP, Lottie, CSS transitions, or any seekable runtime via the Frame Adapter pattern. No build step, no compilation, no DSL, no React requirement. - Preview in the browser.
npx hyperframes previewopens a live preview; edit the HTML and see changes instantly. - Render to MP4.
npx hyperframes render --output output.mp4seeks each frame in headless Chrome, captures it withbeginFrame, and pipes through FFmpeg — runnable locally or in Docker for reproducible output.
<div id="root" data-composition-id="demo"
data-start="0" data-width="1920" data-height="1080">
<video id="clip-1" data-start="0" data-duration="5"
data-track-index="0" src="intro.mp4" muted playsinline></video>
<h1 id="title" class="clip"
data-start="1" data-duration="4" data-track-index="1"
style="font-size: 72px; color: white;">
Welcome to Hyperframes
</h1>
<audio id="bg-music" data-start="0" data-duration="5"
data-track-index="2" data-volume="0.5" src="music.wav"></audio>
</div>When to use the full pipeline vs. a single composition
- Use the pipeline for: capturing a website (website-to-video skill), shipping a product launch, any narrative video with three or more beats, or learning HyperFrames (artifacts leave every creative decision inspectable on disk).
- Skip it for: a ~5-second one-shot animation — a single hand-authored composition suffices.
- Rough threshold: if a non-author needs to understand why a beat looks the way it does, document it in
STORYBOARD.md.
Launch videos — the pipeline at production scale
HeyGen open-sources the real compositions behind its own product-launch videos in the hyperframes-launches repo — production-grade projects, not simplified examples.
- Five standalone projects: (1) HyperFrames launch — original 49.7s announcement with glass-frame intro, CSS, GSAP, Lottie, shaders, Three.js; (2) Website → HyperFrames — website-to-video capture and animation; (3) Timeline Editor launch — 60 fps reveal with SFX, chat spiral, editor showcase; (4) Texture launch — texture-masked text on shader backgrounds; (5) VFX × HeyGen combined — multi-act video combining VFX scenes with HeyGen canvas tests.
- Production patterns they demonstrate: multi-composition projects (4-8 sub-compositions wired into one root timeline); real adapter combinations (GSAP, Lottie, shaders, Three.js, CSS in one render); frame-accurate timing synced to VO and SFX; no proprietary tools — everything renders with standard
hyperframes render. - Each project ships storyboards, design notes, and handoff documentation alongside source code.
brew install git-lfs
git lfs install
git clone https://github.com/heygen-com/hyperframes-launches.git
cd hyperframes-launches/hyperframes-launch
hyperframes previewIterating (re-entry without re-running everything)
- Edit
STORYBOARD.mdto rework creative (mood, assets, entrance timing), then rebuild just that beat. - Open a composition file directly (e.g.
compositions/beat-3-proof.html) and adjust animations, colors, or layout —npx hyperframes previewshows changes live. - To rebuild one beat, prompt the agent (e.g. “Rebuild beat 2 with more energy. Use the product screenshot as full-bleed background.”); it reads
STORYBOARD.md,DESIGN.md, and the transcript, then regenerates just that file. - To swap the voice without redoing the script, re-run TTS against
narration.txt(pronunciation substitutions already applied). - Each artifact is a checkpoint for stopping, handing off, or resuming later.
Try It
- Run
npx hyperframes capture https://example.com -o my-video/captureagainst a brand site, then confirm Step 1’s gate by writing one sentence describing its visual identity from the captured assets. - Author a tiny composition with
data-start/data-duration/data-track-index, runnpx hyperframes previewto iterate live, thennpx hyperframes render --output demo.mp4. - For a real multi-beat video, walk all seven steps in order, treating each gate as a hard checkpoint and pulling beat timings from
transcript.jsonafternpx hyperframes tts+npx hyperframes transcribe. - Before delivery, run
npx hyperframes lintandnpx hyperframes validateuntil both pass with zero errors, thennpx hyperframes snapshot my-video --at <midpoints>to eyeball the beats. - Clone the launch repo (
git clone https://github.com/heygen-com/hyperframes-launches.git, Git LFS required) and runhyperframes previewon a project to study production multi-composition timelines.
Open Questions
- Exact
lint/validate/snapshotflag surfaces beyond the examples shown are not enumerated here — see hyperframes-quickstart-cli and the CLI reference for the full command list. - The introduction page notes a “Frame Adapter pattern” for seekable runtimes but does not define it in these three sources — covered elsewhere in the HyperFrames docs.