Source: HTML Schema Reference, Compositions, Data Attributes — HeyGen HyperFrames docs
HeyGen HyperFrames renders video from HTML, so a valid composition is the authoring contract every agent (or human) must satisfy before a render will succeed. This is the focused deep-dive on that contract: the document structure of a composition, the full set of declarative data-* timing attributes, the structural rules of the HTML schema, and the JavaScript timeline handshake on window.__timelines. For the broader product overview see the hub, HeyGen Hyperframes.
Key Takeaways
- A composition is an HTML document that defines a video timeline. Its root element carries
data-composition-idplusdata-width/data-height; every clip —<video>,<img>,<audio>, nested<div>composition — lives inside it. - HTML is the source of truth. HTML = the clips,
data-*= timing/metadata, CSS = positioning/appearance, GSAP = animation + playback sync. The four layers are strictly separated. - The framework owns timing and playback. It reads
data-start/data-duration/data-track-index, mounts and unmounts clips, and drives media play/pause/seek. Scripts must never callvideo.play(), setaudio.currentTime, or show/hide clips themselves. class="clip"is the visibility switch for visible timed elements (images, text, divs) — but is omitted on<video>(framework manages it directly) and on audio (nothing to show).- Every composition must register one paused GSAP timeline at
window.__timelines["<data-composition-id>"]; the key must match exactly, and the framework auto-nests sub-timelines. - Relative timing lets a clip’s
data-startreference another clip’sid(“start when that ends”), with+ N/- Noffsets for gaps and crossfades.
Compositions
A composition is the fundamental building block — an HTML document that defines one video timeline. Every composition needs a root element with data-composition-id, plus data-width and data-height to set the frame size (and therefore the aspect ratio):
<div id="root" data-composition-id="root"
data-start="0" data-width="1920" data-height="1080">
<!-- Elements go here -->
</div>- Dimensions / aspect ratio — set by the pixel values: landscape is
data-width="1920" data-height="1080", portrait isdata-width="1080" data-height="1920". There is no separate aspect-ratio attribute; the pixel pair defines it. - No special “root” type —
index.htmlis the top-level composition by convention, but any composition can be imported into any other. Top-level vs. nested is just position, not type.
Clip types — a clip is any discrete block on the timeline:
<video>— video clips, B-roll, A-roll<img>— static images, overlays<audio>— music, sound effects<div data-composition-id="...">— nested compositions (animations, grouped sequences — the unit for reusable, scene-like building blocks)
Nested compositions group sequences and are reusable. Two ways to embed one:
- External file (recommended for reuse) — point
data-composition-srcat another HTML file. The framework fetches it, extracts the<template>content, mounts it, runs its scripts, and registers its timeline. Each external file wraps its content in a<template>tag. - Inline — define the nested composition directly inside the parent
<div>; no<template>and nodata-composition-src. Simpler for one-offs.
A typical project lays out the top-level file, a compositions/ folder of reusable pieces, and an assets/ folder:
project/
index.html
compositions/
intro-anim.html
caption-overlay.html
outro-title.html
assets/
video.mp4
music.mp3
logo.pngTwo layers in every composition — keep them separate:
- HTML (primitives) — the declarative structure: what plays, when, and on which track. Controlled entirely by data attributes.
- Script (GSAP) — effects, transitions, dynamic DOM, canvas, SVG. Scripts do not control media playback or clip visibility — duplicating the framework’s job causes conflicts. See HyperFrames Common Mistakes.
Data Attributes
Data attributes are the declarative timing model. The complete documented set, grouped by purpose (defaults noted where the docs give one):
Identity & visibility
id— (all clips, required) unique identifier (e.g."el-1"); used for relative-timing references and CSS targeting.class="clip"— (visible timed elements, required) enables the runtime’s show/hide lifecycle. Omit on<video>(the framework manages video visibility directly) and on audio-only clips (nothing to show/hide).
Timing
data-start— (all clips, required) start time in seconds ("0","5.5") or a clipidreference for relative timing ("intro","intro + 2","intro - 0.5").data-duration— duration in seconds. Required for images. Optional for video/audio (defaults to the source’s remaining duration fromdata-media-start). Not used on compositions (their duration comes from the GSAP timeline’stl.duration()).data-track-index— (all clips, required) timeline track number. Controls z-ordering (higher = in front) and groups clips into rows. Clips on the same track cannot overlap in time.
Media
data-media-start— (video, audio) media playback offset / trim point into the source file, in seconds. Default0.data-volume— (audio, video) volume from0to1. Default1. Use"0"for silent video.data-has-audio—"true"indicates the video has an audio track.data-loop— can override an animated GIF’s loop metadata (GIFs are prepared as timeline-synced video for preview/render).
Composition
data-composition-id— (every composition, required) unique ID for the composition wrapper; must match the key used inwindow.__timelines.data-width/data-height— (on compositions) frame size in pixels.data-composition-src— path to an external composition HTML file (for nested compositions).data-variable-values— JSON object of per-instance values passed to a nested composition, e.g.'{"title":"Hello"}'. Read inside the sub-composition viawindow.__hyperframes.getVariables(); the runtime layers these over the sub-comp’s declared defaults per instance, so one source can be embedded multiple times with different values. (The schema reference notes the framework carries the values through, but the composition’s own script must read and apply them.)data-composition-variables— JSON array of declared variables (id,type,label,default) set on the sub-comp’s<html>root. Drives the Studio editing UI and provides the defaultsgetVariables()reads. The CLI flaghyperframes render --variables '<json>'overrides defaults at top-level render; hostdata-variable-valuesoverrides per instance.
Caption discoverability (on a caption composition’s root node)
data-timeline-role="captions"anddata-caption-root="true"— let the framework identify and special-case caption rendering.
Timeline model: tracks and relative timing
- Clips are placed on the timeline by
data-start+data-duration, stacked into rows bydata-track-index. Higher track index renders in front; same-track clips cannot overlap, so overlapping clips must live on different tracks (this is how crossfades are built). - Relative timing chains clips: a clip’s
data-startcan name another clip’sidto mean “start when that clip ends,” with offsets:<id>— start when that clip ends<id> + <number>— start N seconds after it ends (a gap)<id> - <number>— start N seconds before it ends (an overlap / crossfade)
- Constraints: references resolve within the same composition only; no circular references (the resolver detects cycles and throws); the referenced clip must have a known duration (explicit
data-durationor one inferred from source media); chains can nest (A → B → C) but keep them under 3-4 levels for readability. If a value parses as a number it is treated as absolute seconds, otherwise it is parsed as one of the relative forms above.
HTML Schema Reference
The structural rules that make a composition valid HTML and renderable.
Framework-managed behavior — the runtime reads the data attributes and automatically handles:
- adding primitive clip timeline entries (from
data-start/data-duration/data-track-index) to the GSAP timeline; - media playback (play, pause, seek) for
<video>and<audio>; - clip lifecycle — mounting/unmounting clips based on
data-startanddata-duration; - timeline synchronization keeping media in sync with the master timeline;
- media loading — waiting for all media to load before resolving timing.
Mounting/unmounting controls presence, not appearance — transitions (fade in, slide in) are animated in scripts. The hard rule: do not manually call video.play(), video.pause(), set audio.currentTime, or mount/unmount clips in scripts — the framework owns playback and lifecycle.
Viewport — every composition must include data-width and data-height on the root element.
Per-clip-type rules:
- Video (
<video>) —id,data-start,data-track-indexrequired;data-durationoptional (defaults to source remaining duration; if the source runs out first the clip shows a freeze frame).data-media-starttrims the source;data-volumecontrols level ("0"= silent). Do not addclass="clip"(framework manages video visibility). Do not GSAP-animatewidth/height/top/leftdirectly on a<video>— it can make Chrome stop rendering frames; wrap the video in a<div>and animate the wrapper. - Image (
<img>) —class="clip"required;data-durationrequired (no source duration to default to). Formats: PNG, JPG, WebP, SVG, GIF (animated GIFs are prepared as timeline-synced video;data-loopoverrides loop metadata). Position/size with CSS. - Audio (
<audio>) — invisible, so noclass="clip";data-durationoptional (defaults to source).data-volume(e.g."0.5"background music) anddata-media-start(trim) work as on video. Multiple audio clips can overlap on different tracks for layered sound — together withdata-volume="0"silent video anddata-has-audio, this is how muted video + a separate audio bed are composited. - Composition (
<div>nested) — usesdata-composition-id,data-composition-src,data-start,data-track-index; nodata-duration(duration =tl.duration()). External comps load fromdata-composition-src, wrapped in<template>; each has its ownwindow.__timelinesentry and<script>; the framework auto-nests sub-timelines.
Timeline contract (window.__timelines) — the framework initializes window.__timelines = {} before any scripts run. Every composition must register a GSAP timeline at the key matching its data-composition-id:
const tl = gsap.timeline({ paused: true });
tl.to("#title", { opacity: 1, duration: 0.5 }, 0);
tl.to("#title", { opacity: 0, duration: 0.5 }, 4.5);
window.__timelines["<data-composition-id>"] = tl;Rules: every composition needs a <script> that creates and registers its timeline; all timelines must start paused ({ paused: true }); the framework auto-nests sub-timelines (do not manually add them); duration comes from tl.duration() (no data-duration on composition elements); timelines must be finite (no infinite loops/repeats); and the timeline ID must exactly match the data-composition-id. The animation layer is covered in depth in HyperFrames GSAP Animation.
Pre-render output checklist (what npx hyperframes lint enforces): data-width/data-height on every composition root; each reusable composition in its own file; external comps loaded via data-composition-src; external files wrapped in <template>; all timelines registered in window.__timelines with the correct ID; timed visible elements carry class="clip"; <video> does not; and all data-start references point to existing clip IDs.
Minimal valid composition
Assembled from the documented primitives — a root composition with one video clip, one visible (text) clip, and a registered paused timeline ^[inferred — composite skeleton built from the documented attributes and the timeline contract; not a single verbatim example from the docs]:
<!-- index.html -->
<div id="root" data-composition-id="root"
data-start="0" data-width="1920" data-height="1080">
<!-- video clip: NO class="clip" — the framework manages video visibility -->
<video id="bg" data-start="0" data-duration="5"
data-track-index="0" data-volume="0" src="./assets/clip.mp4"></video>
<!-- visible timed element: class="clip" REQUIRED -->
<h1 id="title" class="clip"
data-start="0" data-duration="5" data-track-index="1">
Hello World
</h1>
<script>
const tl = gsap.timeline({ paused: true }); // must be paused
tl.from("#title", { opacity: 0, y: -50, duration: 1 }, 0);
window.__timelines["root"] = tl; // key === data-composition-id
</script>
</div>This satisfies the output checklist: width/height on the root, class="clip" on the visible heading, the video left without class="clip", and a finite paused timeline registered at the key that matches data-composition-id.
Try It
- Start from the minimal skeleton above; set
data-width/data-heighton the root for your aspect ratio (1920×1080 landscape, 1080×1920 portrait). - Lay clips onto tracks with
data-track-index(higher = foreground); put any clips that must overlap on different tracks. - Chain scenes with relative timing —
data-start="intro + 2"— so trimming one clip ripples the rest of the timeline automatically. - Register exactly one paused GSAP timeline per composition at
window.__timelines["<data-composition-id>"]; never drive media playback or visibility from your own script. - Run
npx hyperframes lintbefore every render (the documented pre-render gate) andnpx hyperframes compositionsto list compositions in the project.