Source: Website to Video — HeyGen HyperFrames docs

The /website-to-video skill is HyperFrames’ URL-to-video path: hand an AI agent a live URL plus a one-line creative direction, and it captures the site, extracts the brand identity, writes a script and storyboard, generates voiceover, builds animated HTML compositions, and delivers a renderable MP4. It is the “warm start” workflow from the HeyGen Hyperframes hub made concrete — the capture step does the scraping, and the standard 7-step HyperFrames pipeline does the rest. This article is the deep-dive on that one path; see the hub for what HyperFrames is overall.

Key Takeaways

  • One prompt, full pipeline. A URL + a creative direction triggers capture, design, script, storyboard, voiceover, build, and validate — no manual steps in between.
  • The agent self-triggers on a URL + a video request. Once the skill is installed (npx skills add heygen-com/hyperframes), the agent loads it automatically when it sees a link and a “make a video” intent; no slash command strictly required.
  • Capture is step one and runs automatically. You don’t call npx hyperframes capture by hand in the normal flow — but it exists as a standalone command for pre-caching, debugging, or harvesting site data outside video production.
  • Branding is pulled from the page, not invented. Pixel-sampled colors, downloaded woff2 fonts, semantically-named assets, page sections, and CTAs are extracted into a capture/ dir, then distilled into a DESIGN.md brand reference the compositions obey.
  • Scenes are “beats.” The storyboard breaks the video into per-beat creative direction; each beat becomes one animated HTML composition (compositions/*.html), so you can rebuild a single beat without re-running the pipeline.
  • Creative direction outweighs format. “Apple keynote energy” or “dark, developer-focused, show code” shapes every visual decision more than the duration/type label does.
  • Optional vision enrichment. A Gemini or OpenRouter API key upgrades asset descriptions from DOM-context guesses to actual image descriptions (~$0.04 per 40-image capture on the paid tier).

How It Works

The skill runs the canonical Hyperframes pipeline — seven steps, each emitting a named artifact that feeds the next:

StepOutputWhat happens
Capturecapture/Screenshots, design tokens, fonts, assets, animations extracted from the live site
DesignDESIGN.mdBrand reference — colors, typography, do’s and don’ts
ScriptSCRIPT.mdNarration text with hook, story, proof, CTA
StoryboardSTORYBOARD.mdPer-beat creative direction — mood, assets, animations, transitions
VO + Timingnarration.wav + transcript.jsonTTS audio with word-level timestamps
Buildcompositions/*.htmlAnimated HTML compositions, one per beat
ValidateSnapshot PNGsVisual verification before delivery

The capture step (the part that makes this a URL workflow). A headless browser loads the page, scrolls through it, and extracts:

  • Screenshots — viewport captures at every scroll depth; the count is dynamic based on page height.
  • Colors — pixel-sampled dominant colors plus computed styles (including oklch/lab conversion).
  • Fonts — CSS font families plus the downloaded woff2 files.
  • Assets — images, SVGs with semantic names, Lottie animations, video previews.
  • Text — all visible text in DOM order.
  • Animations — Web Animations API, scroll-triggered animations, WebGL shaders.
  • Sections — page structure with headings, types, and background colors.
  • CTAs — buttons and links detected by class names and text patterns.

That raw capture/ is what turns into a composition: the Design step compresses it into DESIGN.md (the palette + type + brand rules the build obeys), the Script and Storyboard steps decide what to say and how to pace it, and the Build step writes one animated HTML file per beat using the captured assets. Nothing about the brand is invented — it is sampled off the page.

Vision enrichment (optional)

By default the capture describes each asset from DOM context alone — alt text, nearby headings, CSS classes. Adding a vision key upgrades those to real descriptions, which lets the agent make better storyboard decisions:

  • Without vision: hero-bg.png — 582KB, section: "Hero", above fold (knows it exists, not what it shows).
  • With vision: hero-bg.png — 582KB, A gradient wave in purple and blue sweeps across a dark background, creating an aurora-like effect.

Drop a key in a project-root .env — either GEMINI_API_KEY or OPENROUTER_API_KEY (OpenRouter wins if both are set; default model google/gemini-3.1-flash-lite, overridable via HYPERFRAMES_OPENROUTER_MODEL / HYPERFRAMES_GEMINI_MODEL). Cost is ~0.04**.

Invocation / The Prompt

1. Install the skill once (persists across sessions; works with Claude Code, Cursor, Gemini CLI, and Codex CLI):

npx skills add heygen-com/hyperframes

2. Describe the video in any directory — a URL plus a duration and creative direction:

Create a 25-second product launch video from https://example.com.
Bold, cinematic, dark theme energy.

The agent loads the skill on seeing a URL + a video request and runs the whole pipeline. For the most reliable trigger, lead with “Use the /website-to-video skill.”

3. Preview live (opens in the browser; edits auto-reload):

npx hyperframes preview

4. Render to a file:

npx hyperframes render --output my-video.mp4
# ✓ Captured 750 frames in 12.4s
# ✓ Encoded to my-video.mp4 (25.0s, 1920×1080, 6.8MB)

The capture command (advanced)

The skill captures automatically as step one, so you rarely call this — but it is exposed for pre-caching, debugging a bad capture, or using site data outside video:

npx hyperframes capture https://stripe.com
# ◇  Captured Stripe | Financial Infrastructure → capture
#   Screenshots: 12 · Assets: 45 · Sections: 15 · Fonts: sohne-var
FlagDefaultDescription
-o, --output./captureOutput dir (auto-suffixes ./capture-2/, ./capture-3/… if taken)
--timeout120000Page-load timeout (ms)
--skip-assetsfalseSkip downloading images and fonts
--max-screenshots24Maximum screenshot count
--jsonfalseOutput structured JSON for programmatic use

Iterating without a full re-run

  • Edit the storyboardSTORYBOARD.md is the creative north star; change a beat’s mood or assets and ask the agent to rebuild just that beat.
  • Edit a composition directly — open compositions/beat-3-proof.html and tweak animations, colors, or layout by hand.
  • Rebuild one beat“Rebuild beat 2 with more energy. Use the product screenshot as full-bleed background.”
  • Snapshot to verify without a full render — npx hyperframes snapshot my-project --at 2.9,10.4,18.7 emits key-frame PNGs (flags: --frames default 5, --at timestamps, --timeout default 5000ms).

What You Get

  • A multi-beat video whose scenes are derived from the site’s own content — hook, story, proof, CTA narration arc, one animated HTML composition per beat.
  • On-brand visuals pulled from the page — the actual palette, fonts, and assets, governed by DESIGN.md so the output matches the source brand rather than a generic template.
  • Voiceover with word-level timingnarration.wav + transcript.json, which is what lets captions and asset reveals sync to the narration.
  • Validation snapshots — PNG key frames generated before delivery so you can eyeball compositions without a full encode.
  • A renderable MP4 (example render: 25.0s, 1920×1080, 6.8MB) plus all intermediate artifacts checked into the project for re-editing.

The prompt determines the format — include a duration and a direction:

TypeDurationExample prompt
Social ad10–15s”15-second Instagram reel. Energetic, fast cuts.”
Product launch20–30s”25-second product launch. Apple keynote energy.”
Product tour30–60s”45-second tour showing the top 3 features.”
Brand reel15–30s”20-second brand video. Celebrate the design.”
Feature announcement15–25s”Feature announcement highlighting the new AI agents.”
Teaser8–15s”10-second teaser. Super minimal. Just the hook.”

Use Cases

  • Product launch / showcase — turn a marketing or product page into a 20–30s keynote-style announcement (the docs’ lead example: a Linear-style launch with “Apple keynote” framing).
  • Site / product tour — a 30–60s walkthrough of a site’s top features, scenes built from the captured sections.
  • Social clip — a 10–15s reel or an 8–15s teaser cut from the site’s hero and key assets for Instagram/TikTok-style distribution.
  • Brand reel — a 15–30s piece that celebrates a site’s design using its own palette and type.
  • Capture-only data harvest — run npx hyperframes capture --json purely to extract a site’s colors, fonts, and assets for use outside video production.

Limitations

  • Heavy client-side rendering needs a longer timeout. Sites behind Cloudflare or with heavy CSR can time out; bump --timeout (e.g. --timeout 180000). The capture handles dynamic sites — it just may need more load time.^[inferred — the docs prescribe a longer timeout rather than declaring such sites unsupported; an empty-shell SPA that renders nothing without interaction is the residual risk]
  • Lazy-loaded images on very long pages can be missed. Framer-style sites that lazy-load via IntersectionObserver are handled by the capture scrolling the page, but images near the bottom of very long pages may not all load. A vision key improves asset descriptions but does not increase the count.
  • Color accuracy depends on sampling. The palette comes from pixel sampling plus DOM computed styles; if colors look wrong, inspect the scroll screenshots in capture/screenshots/ to see what the capture actually saw.
  • Vision enrichment requires an external API key. Richer asset descriptions need a Gemini or OpenRouter key and incur (small) per-image cost; without one, the agent works from DOM context only.
  • Trigger can be unreliable if the skill is not installed or the intent is ambiguous — verify the install and lead with “Use the /website-to-video skill.”

Try It

  1. npx skills add heygen-com/hyperframes in a throwaway directory, then prompt your agent: “Create a 20-second product launch video from [your site]. Apple keynote energy.” Let it run the full pipeline.
  2. npx hyperframes preview to watch it in the browser, then iterate with one-line beat edits (“rebuild beat 2 with more energy”).
  3. Inspect the artifacts: open DESIGN.md to see the brand it pulled, and STORYBOARD.md to see the beat breakdown — then npx hyperframes render --output launch.mp4.
  4. For a pure data pull, run npx hyperframes capture https://stripe.com --json and look at the captured colors, fonts, and assets.