HyperFrames Pipeline & Getting Started

Source: The Pipeline, Introduction, Launch Videos — HeyGen HyperFrames docs

HyperFrames is an open-source framework that turns HTML into deterministic, frame-by-frame rendered video — “define a video the same way you build a web page.” The canonical production workflow is a seven-step pipeline (capture → design → script → storyboard → VO+timing → build → validate) where each step emits a named on-disk artifact that feeds the next and acts as a re-entry checkpoint. Getting started maps onto the same primitives at a smaller scale — write HTML, preview in the browser, render to MP4 — and the pipeline is the recommended path once a video has three or more beats or needs to be inspectable by a non-author.

Key Takeaways

Seven sequential steps, one artifact each: Capture → capture/; Design → DESIGN.md; Script → SCRIPT.md; Storyboard → STORYBOARD.md; VO+Timing → narration.wav + narration.txt + transcript.json; Build → compositions/*.html (one HTML file per beat); Validate → snapshot PNGs + passing lint/validate.
Each step has a “gate” — an explicit done-condition before advancing (e.g. Step 1’s gate is being able to name the source’s top colors, fonts, and standout assets; Step 7’s is lint and validate passing with zero errors and a shareable Studio preview URL).
Durations come from narration, not estimates — SCRIPT.md must be finished before storyboarding, and storyboard beat timings (0.0s - 5.8s) are taken from the word-level transcript.json produced in Step 5.
Determinism is the design principle: rendering is seek-driven with no wall-clock dependencies — frame = floor(time * fps), each frame captured via Chrome’s beginFrame API and encoded with FFmpeg, so identical input yields identical output (reliable CI + batch rendering).
Getting-started essentials: compositions are plain HTML with data-start / data-duration (timing) and data-track-index (layout); the three-command loop is write HTML → npx hyperframes preview → npx hyperframes render --output output.mp4. No build step, no DSL, no React requirement.
Built for agents: HTML is the format LLMs generate best, and the CLI is non-interactive by default (flag-driven, plain-text output, fail-fast) so an agent can drive every command; add --human-friendly for the interactive terminal UI.
Launch-video framing: HeyGen open-sources its real product-launch compositions in the hyperframes-launches repo — production-grade (not toy) projects that show the pipeline at full scale: multi-composition projects of 4-8 sub-compositions wired into one root timeline, multiple adapters (GSAP, Lottie, shaders, Three.js, CSS) in a single render, all via standard hyperframes render.
Re-entry without re-running everything: edit STORYBOARD.md and rebuild a single beat, open one composition file and tweak it live under preview, or swap the voice by re-running TTS against narration.txt (which already holds pronunciation substitutions) without redoing the script.

The seven pipeline stages

Each stage produces a named artifact and has a gate that must pass before moving on.

#	Step	Output	What happens
1	Capture	`capture/`	Extract screenshots, design tokens, fonts, assets, animations from a source
2	Design	`DESIGN.md`	Brand reference: colors, typography, components, do’s and don’ts
3	Script	`SCRIPT.md`	Narration text with hook, story, proof, and CTA
4	Storyboard	`STORYBOARD.md`	Per-beat creative direction: mood, assets, animations, transitions
5	VO + Timing	`narration.wav` + `transcript.json`	TTS audio with word-level timestamps
6	Build	`compositions/*.html`	Animated HTML compositions, one per beat
7	Validate	Snapshot PNGs + `lint`/`validate` pass	Visual verification and runtime checks before delivery

Step 1 — Capture (`capture/`)

Extracts screenshots at every scroll depth, pixel-sampled color palettes, the CSS font stack with downloaded woff2 files, semantically-named images and SVGs, Lottie animations, and detected page animations.
Optional Gemini vision enrichment adds AI descriptions of every captured asset.
For non-website sources (PDFs, decks, CSVs, notes), gather assets into capture/ manually for later reference.
Gate: you can describe the source’s visual identity in one or two sentences and name its top colors, fonts, and standout assets.

npx hyperframes capture https://example.com -o my-video/capture

Step 2 — Design (`DESIGN.md`)

Encodes visual identity factually so downstream steps reference exact colors, fonts, and components. Six sections: Overview, Colors (5-10 HEX with semantic roles), Typography, Components, Imagery, Do’s and Don’ts.
Gate: DESIGN.md exists with all six sections filled from real captured data (or deliberately chosen for net-new projects).

Step 3 — Script (`SCRIPT.md`)

The narration foundation; scene durations derive from narration, so finish the script before storyboarding. Typical structure: hook → story → proof → call-to-action. Reference real features/stats from capture/extracted/visible-text.txt.
For non-narrated videos, SCRIPT.md becomes a per-beat copy plan with on-screen text and timing notes.
Gate: SCRIPT.md exists in the project root.

Step 4 — Storyboard (`STORYBOARD.md`)

Specifies what to build per beat: timing, the exact narration line, mood & camera, assets (by path), techniques (2-3 from the techniques library — SVG path drawing, Canvas 2D, CSS 3D, per-word typography, Lottie, video compositing, typing effects, variable fonts, MotionPath, velocity transitions, audio-reactive), transitions in/out, and SFX.
Timing fields (e.g. 0.0s - 5.8s) come from transcript.json once Step 5 runs.
Gate: STORYBOARD.md exists with beat-by-beat direction and an asset audit naming every file used.

Step 5 — VO and timing (`narration.wav`, `narration.txt`, `transcript.json`)

narration.wav ships with the final render; narration.txt is the exact spoken text with pronunciation substitutions applied (API → A P I, $2T → two trillion) and is kept distinct from SCRIPT.md so the voice can be regenerated later without redoing substitutions; transcript.json holds [{ text, start, end }] per word and every later step reads it for timing.
Ships multiple TTS adapters: Kokoro, ElevenLabs, HeyGen. After generating audio, update STORYBOARD.md with real beat boundaries from transcript.json.
Gate: all three files exist and STORYBOARD.md timings reference real timestamps, not estimates.

npx hyperframes tts SCRIPT.md --voice af_nova --output narration.wav
npx hyperframes transcribe narration.wav

Step 6 — Build (`compositions/<beat-name>.html`)

The storyboard becomes runnable HTML — one file per beat — importing captured assets by path, using exact colors/fonts from DESIGN.md, animating with storyboard-specified techniques.
For multi-beat videos, spawn a focused sub-agent per beat with fresh context, that beat’s storyboard section, the needed asset paths, and relevant technique references; self-review each composition after building.
Gate: every composition is self-reviewed — no overlapping elements, no misplaced assets, no static images sitting unanimated.

Step 7 — Validate (snapshot PNGs + `lint`/`validate`)

lint runs static HTML structure checks (missing attributes, timeline registration issues, tween conflicts, CSS-transform vs. GSAP conflicts); validate loads each composition in headless Chrome to surface runtime JS errors, missing assets, and failed network requests; snapshot captures frames at specific timestamps for visual inspection.
Gate: lint and validate pass with zero errors, snapshot frames look right, and the Studio preview URL is ready to share.

npx hyperframes lint                              # static HTML structure checks
npx hyperframes validate                          # loads in headless Chrome, catches runtime errors
npx hyperframes snapshot my-video --at 2.9,10.4   # PNGs at beat midpoints
npx hyperframes render --output my-video.mp4

For personalized or catalog outputs, render with --batch rows.json --output "renders/{name}.mp4" and use the generated manifest.json as the delivery checklist.

Project layout

The pipeline writes a predictable on-disk tree; capture/ only appears when capturing a source.

my-video/
├── capture/                    # Step 1, only present when capturing a source
│   ├── screenshots/            # scroll-000.png, scroll-001.png, …
│   ├── assets/                 # downloaded images, SVGs, fonts
│   ├── extracted/              # tokens.json, visible-text.txt, asset-descriptions.md
│   ├── AGENTS.md               # capture summary for AI agents
│   └── CLAUDE.md
├── DESIGN.md                   # Step 2, brand cheat sheet
├── SCRIPT.md                   # Step 3, narration backbone
├── STORYBOARD.md               # Step 4, beat-by-beat creative plan
├── narration.wav               # Step 5, TTS audio
├── narration.txt               # Step 5, exact spoken text (with pronunciation subs)
├── transcript.json             # Step 5, word-level timestamps
├── compositions/               # Step 6, one HTML file per beat
│   ├── beat-1-hook.html
│   ├── beat-2-story.html
│   └── …
├── snapshots/                  # Step 7, visual verification PNGs
├── renders/                    # optional final MP4 outputs
└── index.html                  # root project file wiring compositions into a timeline

Getting started (the minimal loop)

For small one-shot work you do not need the full pipeline — the same primitives scale down.

Write HTML. A composition is an HTML document; each element carries data-start and data-duration for timing and data-track-index for layout. Animate with GSAP, Lottie, CSS transitions, or any seekable runtime via the Frame Adapter pattern. No build step, no compilation, no DSL, no React requirement.
Preview in the browser. npx hyperframes preview opens a live preview; edit the HTML and see changes instantly.
Render to MP4. npx hyperframes render --output output.mp4 seeks each frame in headless Chrome, captures it with beginFrame, and pipes through FFmpeg — runnable locally or in Docker for reproducible output.

<div id="root" data-composition-id="demo"
     data-start="0" data-width="1920" data-height="1080">
 
  <video id="clip-1" data-start="0" data-duration="5"
         data-track-index="0" src="intro.mp4" muted playsinline></video>
 
  <h1 id="title" class="clip"
      data-start="1" data-duration="4" data-track-index="1"
      style="font-size: 72px; color: white;">
    Welcome to Hyperframes
  </h1>
 
  <audio id="bg-music" data-start="0" data-duration="5"
         data-track-index="2" data-volume="0.5" src="music.wav"></audio>
</div>

When to use the full pipeline vs. a single composition

Use the pipeline for: capturing a website (website-to-video skill), shipping a product launch, any narrative video with three or more beats, or learning HyperFrames (artifacts leave every creative decision inspectable on disk).
Skip it for: a ~5-second one-shot animation — a single hand-authored composition suffices.
Rough threshold: if a non-author needs to understand why a beat looks the way it does, document it in STORYBOARD.md.

Launch videos — the pipeline at production scale

HeyGen open-sources the real compositions behind its own product-launch videos in the hyperframes-launches repo — production-grade projects, not simplified examples.

Five standalone projects: (1) HyperFrames launch — original 49.7s announcement with glass-frame intro, CSS, GSAP, Lottie, shaders, Three.js; (2) Website → HyperFrames — website-to-video capture and animation; (3) Timeline Editor launch — 60 fps reveal with SFX, chat spiral, editor showcase; (4) Texture launch — texture-masked text on shader backgrounds; (5) VFX × HeyGen combined — multi-act video combining VFX scenes with HeyGen canvas tests.
Production patterns they demonstrate: multi-composition projects (4-8 sub-compositions wired into one root timeline); real adapter combinations (GSAP, Lottie, shaders, Three.js, CSS in one render); frame-accurate timing synced to VO and SFX; no proprietary tools — everything renders with standard hyperframes render.
Each project ships storyboards, design notes, and handoff documentation alongside source code.

brew install git-lfs
git lfs install
git clone https://github.com/heygen-com/hyperframes-launches.git
cd hyperframes-launches/hyperframes-launch
hyperframes preview

Iterating (re-entry without re-running everything)

Edit STORYBOARD.md to rework creative (mood, assets, entrance timing), then rebuild just that beat.
Open a composition file directly (e.g. compositions/beat-3-proof.html) and adjust animations, colors, or layout — npx hyperframes preview shows changes live.
To rebuild one beat, prompt the agent (e.g. “Rebuild beat 2 with more energy. Use the product screenshot as full-bleed background.”); it reads STORYBOARD.md, DESIGN.md, and the transcript, then regenerates just that file.
To swap the voice without redoing the script, re-run TTS against narration.txt (pronunciation substitutions already applied).
Each artifact is a checkpoint for stopping, handing off, or resuming later.

Try It

Run npx hyperframes capture https://example.com -o my-video/capture against a brand site, then confirm Step 1’s gate by writing one sentence describing its visual identity from the captured assets.
Author a tiny composition with data-start / data-duration / data-track-index, run npx hyperframes preview to iterate live, then npx hyperframes render --output demo.mp4.
For a real multi-beat video, walk all seven steps in order, treating each gate as a hard checkpoint and pulling beat timings from transcript.json after npx hyperframes tts + npx hyperframes transcribe.
Before delivery, run npx hyperframes lint and npx hyperframes validate until both pass with zero errors, then npx hyperframes snapshot my-video --at <midpoints> to eyeball the beats.
Clone the launch repo (git clone https://github.com/heygen-com/hyperframes-launches.git, Git LFS required) and run hyperframes preview on a project to study production multi-composition timelines.

Open Questions

Exact lint / validate / snapshot flag surfaces beyond the examples shown are not enumerated here — see hyperframes-quickstart-cli and the CLI reference for the full command list.
The introduction page notes a “Frame Adapter pattern” for seekable runtimes but does not define it in these three sources — covered elsewhere in the HyperFrames docs.

Jonathon's AI Wiki

Explorer

HyperFrames Pipeline & Getting Started

Key Takeaways

The seven pipeline stages

Step 1 — Capture (`capture/`)

Step 2 — Design (`DESIGN.md`)

Step 3 — Script (`SCRIPT.md`)

Step 4 — Storyboard (`STORYBOARD.md`)

Step 5 — VO and timing (`narration.wav`, `narration.txt`, `transcript.json`)

Step 6 — Build (`compositions/<beat-name>.html`)

Step 7 — Validate (snapshot PNGs + `lint`/`validate`)

Project layout

Getting started (the minimal loop)

When to use the full pipeline vs. a single composition

Launch videos — the pipeline at production scale

Iterating (re-entry without re-running everything)

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

HyperFrames Pipeline & Getting Started

Key Takeaways

The seven pipeline stages

Step 1 — Capture (capture/)

Step 2 — Design (DESIGN.md)

Step 3 — Script (SCRIPT.md)

Step 4 — Storyboard (STORYBOARD.md)

Step 5 — VO and timing (narration.wav, narration.txt, transcript.json)

Step 6 — Build (compositions/<beat-name>.html)

Step 7 — Validate (snapshot PNGs + lint/validate)

Project layout

Getting started (the minimal loop)

When to use the full pipeline vs. a single composition

Launch videos — the pipeline at production scale

Iterating (re-entry without re-running everything)

Try It

Open Questions

Related

Graph View

Table of Contents

Backlinks

Step 1 — Capture (`capture/`)

Step 2 — Design (`DESIGN.md`)

Step 3 — Script (`SCRIPT.md`)

Step 4 — Storyboard (`STORYBOARD.md`)

Step 5 — VO and timing (`narration.wav`, `narration.txt`, `transcript.json`)

Step 6 — Build (`compositions/<beat-name>.html`)

Step 7 — Validate (snapshot PNGs + `lint`/`validate`)