Source: The Pipeline, Introduction, Launch Videos — HeyGen HyperFrames docs

HyperFrames is an open-source framework that turns HTML into deterministic, frame-by-frame rendered video — “define a video the same way you build a web page.” The canonical production workflow is a seven-step pipeline (capture → design → script → storyboard → VO+timing → build → validate) where each step emits a named on-disk artifact that feeds the next and acts as a re-entry checkpoint. Getting started maps onto the same primitives at a smaller scale — write HTML, preview in the browser, render to MP4 — and the pipeline is the recommended path once a video has three or more beats or needs to be inspectable by a non-author.

Key Takeaways

  • Seven sequential steps, one artifact each: Capture → capture/; Design → DESIGN.md; Script → SCRIPT.md; Storyboard → STORYBOARD.md; VO+Timing → narration.wav + narration.txt + transcript.json; Build → compositions/*.html (one HTML file per beat); Validate → snapshot PNGs + passing lint/validate.
  • Each step has a “gate” — an explicit done-condition before advancing (e.g. Step 1’s gate is being able to name the source’s top colors, fonts, and standout assets; Step 7’s is lint and validate passing with zero errors and a shareable Studio preview URL).
  • Durations come from narration, not estimatesSCRIPT.md must be finished before storyboarding, and storyboard beat timings (0.0s - 5.8s) are taken from the word-level transcript.json produced in Step 5.
  • Determinism is the design principle: rendering is seek-driven with no wall-clock dependencies — frame = floor(time * fps), each frame captured via Chrome’s beginFrame API and encoded with FFmpeg, so identical input yields identical output (reliable CI + batch rendering).
  • Getting-started essentials: compositions are plain HTML with data-start / data-duration (timing) and data-track-index (layout); the three-command loop is write HTML → npx hyperframes previewnpx hyperframes render --output output.mp4. No build step, no DSL, no React requirement.
  • Built for agents: HTML is the format LLMs generate best, and the CLI is non-interactive by default (flag-driven, plain-text output, fail-fast) so an agent can drive every command; add --human-friendly for the interactive terminal UI.
  • Launch-video framing: HeyGen open-sources its real product-launch compositions in the hyperframes-launches repo — production-grade (not toy) projects that show the pipeline at full scale: multi-composition projects of 4-8 sub-compositions wired into one root timeline, multiple adapters (GSAP, Lottie, shaders, Three.js, CSS) in a single render, all via standard hyperframes render.
  • Re-entry without re-running everything: edit STORYBOARD.md and rebuild a single beat, open one composition file and tweak it live under preview, or swap the voice by re-running TTS against narration.txt (which already holds pronunciation substitutions) without redoing the script.

The seven pipeline stages

Each stage produces a named artifact and has a gate that must pass before moving on.

#StepOutputWhat happens
1Capturecapture/Extract screenshots, design tokens, fonts, assets, animations from a source
2DesignDESIGN.mdBrand reference: colors, typography, components, do’s and don’ts
3ScriptSCRIPT.mdNarration text with hook, story, proof, and CTA
4StoryboardSTORYBOARD.mdPer-beat creative direction: mood, assets, animations, transitions
5VO + Timingnarration.wav + transcript.jsonTTS audio with word-level timestamps
6Buildcompositions/*.htmlAnimated HTML compositions, one per beat
7ValidateSnapshot PNGs + lint/validate passVisual verification and runtime checks before delivery

Step 1 — Capture (capture/)

  • Extracts screenshots at every scroll depth, pixel-sampled color palettes, the CSS font stack with downloaded woff2 files, semantically-named images and SVGs, Lottie animations, and detected page animations.
  • Optional Gemini vision enrichment adds AI descriptions of every captured asset.
  • For non-website sources (PDFs, decks, CSVs, notes), gather assets into capture/ manually for later reference.
  • Gate: you can describe the source’s visual identity in one or two sentences and name its top colors, fonts, and standout assets.
npx hyperframes capture https://example.com -o my-video/capture

Step 2 — Design (DESIGN.md)

  • Encodes visual identity factually so downstream steps reference exact colors, fonts, and components. Six sections: Overview, Colors (5-10 HEX with semantic roles), Typography, Components, Imagery, Do’s and Don’ts.
  • Gate: DESIGN.md exists with all six sections filled from real captured data (or deliberately chosen for net-new projects).

Step 3 — Script (SCRIPT.md)

  • The narration foundation; scene durations derive from narration, so finish the script before storyboarding. Typical structure: hook → story → proof → call-to-action. Reference real features/stats from capture/extracted/visible-text.txt.
  • For non-narrated videos, SCRIPT.md becomes a per-beat copy plan with on-screen text and timing notes.
  • Gate: SCRIPT.md exists in the project root.

Step 4 — Storyboard (STORYBOARD.md)

  • Specifies what to build per beat: timing, the exact narration line, mood & camera, assets (by path), techniques (2-3 from the techniques library — SVG path drawing, Canvas 2D, CSS 3D, per-word typography, Lottie, video compositing, typing effects, variable fonts, MotionPath, velocity transitions, audio-reactive), transitions in/out, and SFX.
  • Timing fields (e.g. 0.0s - 5.8s) come from transcript.json once Step 5 runs.
  • Gate: STORYBOARD.md exists with beat-by-beat direction and an asset audit naming every file used.

Step 5 — VO and timing (narration.wav, narration.txt, transcript.json)

  • narration.wav ships with the final render; narration.txt is the exact spoken text with pronunciation substitutions applied (APIA P I, $2Ttwo trillion) and is kept distinct from SCRIPT.md so the voice can be regenerated later without redoing substitutions; transcript.json holds [{ text, start, end }] per word and every later step reads it for timing.
  • Ships multiple TTS adapters: Kokoro, ElevenLabs, HeyGen. After generating audio, update STORYBOARD.md with real beat boundaries from transcript.json.
  • Gate: all three files exist and STORYBOARD.md timings reference real timestamps, not estimates.
npx hyperframes tts SCRIPT.md --voice af_nova --output narration.wav
npx hyperframes transcribe narration.wav

Step 6 — Build (compositions/<beat-name>.html)

  • The storyboard becomes runnable HTML — one file per beat — importing captured assets by path, using exact colors/fonts from DESIGN.md, animating with storyboard-specified techniques.
  • For multi-beat videos, spawn a focused sub-agent per beat with fresh context, that beat’s storyboard section, the needed asset paths, and relevant technique references; self-review each composition after building.
  • Gate: every composition is self-reviewed — no overlapping elements, no misplaced assets, no static images sitting unanimated.

Step 7 — Validate (snapshot PNGs + lint/validate)

  • lint runs static HTML structure checks (missing attributes, timeline registration issues, tween conflicts, CSS-transform vs. GSAP conflicts); validate loads each composition in headless Chrome to surface runtime JS errors, missing assets, and failed network requests; snapshot captures frames at specific timestamps for visual inspection.
  • Gate: lint and validate pass with zero errors, snapshot frames look right, and the Studio preview URL is ready to share.
npx hyperframes lint                              # static HTML structure checks
npx hyperframes validate                          # loads in headless Chrome, catches runtime errors
npx hyperframes snapshot my-video --at 2.9,10.4   # PNGs at beat midpoints
npx hyperframes render --output my-video.mp4
  • For personalized or catalog outputs, render with --batch rows.json --output "renders/{name}.mp4" and use the generated manifest.json as the delivery checklist.

Project layout

The pipeline writes a predictable on-disk tree; capture/ only appears when capturing a source.

my-video/
├── capture/                    # Step 1, only present when capturing a source
│   ├── screenshots/            # scroll-000.png, scroll-001.png, …
│   ├── assets/                 # downloaded images, SVGs, fonts
│   ├── extracted/              # tokens.json, visible-text.txt, asset-descriptions.md
│   ├── AGENTS.md               # capture summary for AI agents
│   └── CLAUDE.md
├── DESIGN.md                   # Step 2, brand cheat sheet
├── SCRIPT.md                   # Step 3, narration backbone
├── STORYBOARD.md               # Step 4, beat-by-beat creative plan
├── narration.wav               # Step 5, TTS audio
├── narration.txt               # Step 5, exact spoken text (with pronunciation subs)
├── transcript.json             # Step 5, word-level timestamps
├── compositions/               # Step 6, one HTML file per beat
│   ├── beat-1-hook.html
│   ├── beat-2-story.html
│   └── …
├── snapshots/                  # Step 7, visual verification PNGs
├── renders/                    # optional final MP4 outputs
└── index.html                  # root project file wiring compositions into a timeline

Getting started (the minimal loop)

For small one-shot work you do not need the full pipeline — the same primitives scale down.

  • Write HTML. A composition is an HTML document; each element carries data-start and data-duration for timing and data-track-index for layout. Animate with GSAP, Lottie, CSS transitions, or any seekable runtime via the Frame Adapter pattern. No build step, no compilation, no DSL, no React requirement.
  • Preview in the browser. npx hyperframes preview opens a live preview; edit the HTML and see changes instantly.
  • Render to MP4. npx hyperframes render --output output.mp4 seeks each frame in headless Chrome, captures it with beginFrame, and pipes through FFmpeg — runnable locally or in Docker for reproducible output.
<div id="root" data-composition-id="demo"
     data-start="0" data-width="1920" data-height="1080">
 
  <video id="clip-1" data-start="0" data-duration="5"
         data-track-index="0" src="intro.mp4" muted playsinline></video>
 
  <h1 id="title" class="clip"
      data-start="1" data-duration="4" data-track-index="1"
      style="font-size: 72px; color: white;">
    Welcome to Hyperframes
  </h1>
 
  <audio id="bg-music" data-start="0" data-duration="5"
         data-track-index="2" data-volume="0.5" src="music.wav"></audio>
</div>

When to use the full pipeline vs. a single composition

  • Use the pipeline for: capturing a website (website-to-video skill), shipping a product launch, any narrative video with three or more beats, or learning HyperFrames (artifacts leave every creative decision inspectable on disk).
  • Skip it for: a ~5-second one-shot animation — a single hand-authored composition suffices.
  • Rough threshold: if a non-author needs to understand why a beat looks the way it does, document it in STORYBOARD.md.

Launch videos — the pipeline at production scale

HeyGen open-sources the real compositions behind its own product-launch videos in the hyperframes-launches repo — production-grade projects, not simplified examples.

  • Five standalone projects: (1) HyperFrames launch — original 49.7s announcement with glass-frame intro, CSS, GSAP, Lottie, shaders, Three.js; (2) Website → HyperFrames — website-to-video capture and animation; (3) Timeline Editor launch — 60 fps reveal with SFX, chat spiral, editor showcase; (4) Texture launch — texture-masked text on shader backgrounds; (5) VFX × HeyGen combined — multi-act video combining VFX scenes with HeyGen canvas tests.
  • Production patterns they demonstrate: multi-composition projects (4-8 sub-compositions wired into one root timeline); real adapter combinations (GSAP, Lottie, shaders, Three.js, CSS in one render); frame-accurate timing synced to VO and SFX; no proprietary tools — everything renders with standard hyperframes render.
  • Each project ships storyboards, design notes, and handoff documentation alongside source code.
brew install git-lfs
git lfs install
git clone https://github.com/heygen-com/hyperframes-launches.git
cd hyperframes-launches/hyperframes-launch
hyperframes preview

Iterating (re-entry without re-running everything)

  • Edit STORYBOARD.md to rework creative (mood, assets, entrance timing), then rebuild just that beat.
  • Open a composition file directly (e.g. compositions/beat-3-proof.html) and adjust animations, colors, or layout — npx hyperframes preview shows changes live.
  • To rebuild one beat, prompt the agent (e.g. “Rebuild beat 2 with more energy. Use the product screenshot as full-bleed background.”); it reads STORYBOARD.md, DESIGN.md, and the transcript, then regenerates just that file.
  • To swap the voice without redoing the script, re-run TTS against narration.txt (pronunciation substitutions already applied).
  • Each artifact is a checkpoint for stopping, handing off, or resuming later.

Try It

  • Run npx hyperframes capture https://example.com -o my-video/capture against a brand site, then confirm Step 1’s gate by writing one sentence describing its visual identity from the captured assets.
  • Author a tiny composition with data-start / data-duration / data-track-index, run npx hyperframes preview to iterate live, then npx hyperframes render --output demo.mp4.
  • For a real multi-beat video, walk all seven steps in order, treating each gate as a hard checkpoint and pulling beat timings from transcript.json after npx hyperframes tts + npx hyperframes transcribe.
  • Before delivery, run npx hyperframes lint and npx hyperframes validate until both pass with zero errors, then npx hyperframes snapshot my-video --at <midpoints> to eyeball the beats.
  • Clone the launch repo (git clone https://github.com/heygen-com/hyperframes-launches.git, Git LFS required) and run hyperframes preview on a project to study production multi-composition timelines.

Open Questions

  • Exact lint / validate / snapshot flag surfaces beyond the examples shown are not enumerated here — see hyperframes-quickstart-cli and the CLI reference for the full command list.
  • The introduction page notes a “Frame Adapter pattern” for seekable runtimes but does not define it in these three sources — covered elsewhere in the HyperFrames docs.