Source: Deterministic Rendering, Frame Adapters, HyperFrames vs Remotion — HeyGen HyperFrames docs
HyperFrames’ agent- and CI-friendliness rests on two conceptual choices documented in its concept pages: deterministic rendering (the same composition always renders to the same video) and the frame adapter pattern (any animation runtime that can answer “what should the screen look like at frame N?” can plug in). This deep-dive covers both mechanisms in detail, plus the substantive technical reasons HyperFrames diverges from Remotion. For the framework overview, install flow, and feature catalog, see HeyGen Hyperframes.
Key Takeaways
- Determinism is the load-bearing idea, not a feature. Same composition + same assets → identical MP4, every time. This is the contract that makes version control, CI diffing, and unattended agent rendering trustworthy.
- Rendering is seek-driven, never realtime. Each frame’s time is computed with integer math
time = floor(frame) / fps— fully decoupled from the wall clock — then the frame adapter is seeked to it and Chrome’sHeadlessExperimental.beginFramecaptures the pixel buffer atomically. - Five things break determinism: wall-clock calls (
Date.now(),requestAnimationFrame, system timers), unseededMath.random(), render-time network fetches, unfixedfps/width/height, and infinite/unknown duration. - A frame adapter is a bring-your-own-runtime shim that answers
seekFrame(frame)by positioning all animation, DOM, and canvas state to that exact frame. The adapter never owns a clock — the host drives it. - First-party adapters exist for GSAP, Anime.js, CSS keyframes, Lottie/dotLottie, Three.js/WebGL, and the Web Animations API — and one composition can mix runtimes. ^[inferred — the vs-Remotion page lists all of these rendering together in BeginFrame mode; the hub article states compositions can mix them]
- The headline HyperFrames-vs-Remotion difference is concrete and technical: library-clock animations (GSAP et al.) are seekable and frame-accurate in HyperFrames because it pauses the timeline and seeks it to
frame / fps; in Remotion the same GSAP timeline runs at wall-clock speed and races to its end during render.
Deterministic Rendering
The core guarantee from the determinism page: the same composition always produces the same video. Everything else in HyperFrames’ architecture is downstream of making that true.
The seek-driven pipeline
There is no realtime playback during a render. Every frame is independently seeked and captured, in four steps:
- Frame clock. The engine (
@hyperframes/engine) computes each frame’s time with integer math:time = floor(frame) / fps. No wall-clock dependency — rendering is entirely decoupled from real time. - Seek. The frame adapter receives
seekFrame(frame)and deterministically positions all animations, DOM state, and canvas content to that exact frame. ItsrenderSeekpauses all GSAP timelines and seeks them to the computed time. - Capture. Chrome’s
HeadlessExperimental.beginFrameAPI captures the pixel buffer for the current frame — a single, atomic operation, so no partial paints or race conditions. - Encode. FFmpeg encodes the captured frames into the final MP4.
The flow, in one line:
Frame clock (t = frame/fps) -> Seek (adapter.seekFrame) -> Capture (beginFrame) -> Encode (FFmpeg) -> MP4What breaks determinism (the banned sources)
These are non-negotiable, and they apply to every frame adapter as much as to the engine:
- No wall-clock dependencies — no
Date.now(), norequestAnimationFrame, no system timers. - No unseeded randomness —
Math.random()without a seed breaks determinism. (This is why the hub’s “noMath.random()” composition rule exists.) - No render-time network fetches — all assets must be loaded before rendering starts.
- Fixed output parameters —
fps,width, andheightare locked before the first frame. - Finite duration — every composition has a known, finite length.
Why it matters operationally
Determinism is what turns “render a video” from an interactive, babysit-it task into a reproducible build artifact. The determinism page names the payoffs directly — “automated pipelines, CI testing, and AI-driven workflows” — which unpack into:
- Version-controlled compositions. The composition is plain HTML (text), so it lives in git like any other source file.
- Re-render from any commit. Because identical input yields identical output, checking out an old commit reproduces that commit’s exact video — no “why does the 0:03 frame look different now?” drift. ^[inferred — direct consequence of the same-input-same-output guarantee; corroborated by the hub article]
- Scheduled / agent rendering. A cron job or a Claude Code subagent can run
npx hyperframes renderunattended and trust the output. No flaky replays, no frame-rate-dependent results. - CI diffing. Two renders of the same composition are byte-for-byte comparable (in BeginFrame mode, see vs-Remotion below), so any pixel difference in CI is a real content change, not renderer noise.
Cross-machine reproducibility: Docker mode
Local rendering without Docker can show slight differences from platform-specific font rendering and Chrome version. For exact reproducibility across machines, render in Docker:
npx hyperframes render --docker --output output.mp4Docker mode pins an exact Chrome version and font set, guaranteeing the same Chromium rendering engine, the same system fonts (no platform-specific font substitution), and the same FFmpeg encoder version across all platforms.
Preview/render parity (and its one caveat)
The browser preview and the rendered MP4 should match, achieved through: one runtime (the same hyperframe.runtime drives both), producer-canonical seek semantics as the source of truth, and readiness gates (__playerReady / __renderReady) that block capture until the composition is fully loaded.
The caveat the docs are explicit about: parity means visual fidelity, not performance parity. Preview plays in real time and is bound by your hardware’s frame rate; render is seek-driven and frame-at-a-time, so it never drops frames regardless of per-frame cost. A composition can stutter in preview and still render perfectly.
Frame Adapters
The Frame Adapter pattern is how HyperFrames supports multiple animation runtimes. Every adapter answers one question — what should the screen look like at frame N? — and if a runtime can answer that, it can plug into HyperFrames. The Adapter API is currently v0 (experimental): the core contract (seek-by-frame, deterministic output) is stable, but method signatures may change before v1.
The host drives; the adapter responds
The key architectural inversion: the adapter never controls its own clock. The host (the engine or producer) drives rendering by calling adapter methods in a strict sequence, and the adapter only responds to seek commands. The lifecycle:
init(context) -> getDurationFrames() -> [ seekFrame(frame) ] x N -> destroy()Per frame, the host normalizes the frame index first, then asks the adapter to position itself, then captures:
// engine/render-loop.ts
normalizedFrame = clamp(Math.floor(frame), 0, durationFrames);
await adapter.init?.({ compositionId, fps, width, height, rootElement });
const durationFrames = adapter.getDurationFrames();
for (let frame = 0; frame <= durationFrames; frame += 1) {
await adapter.seekFrame(frame);
// capture pixel buffer for this frame
}
await adapter.destroy?.();The v0 Adapter API
// adapters/types.ts
type FrameAdapterContext = {
compositionId: string;
fps: number;
width: number;
height: number;
rootElement?: HTMLElement;
};
type FrameAdapter = {
id: string;
init?: (ctx: FrameAdapterContext) => Promise<void> | void;
getDurationFrames: () => number;
seekFrame: (frame: number) => Promise<void> | void;
destroy?: () => Promise<void> | void;
};Required semantics
This is what lets the capture engine seek to and freeze an exact frame:
getDurationFrames()must return a finite integer>= 0.seekFrame(frame)must support arbitrary seek order (forward, backward, random) — the engine renders in order, but tooling can jump anywhere.seekFrame(frame)must be idempotent for the same input frame (seek to frame 50 twice, get the identical screen).seekFrame(frame)must clamp internal time to the adapter’s range.- Adapters should be paused/seek-driven, not clock-driven — no side effects that depend on call order, no async that resolves after the frame is “committed.”
The determinism contract (adapter edition)
The same banned-sources list from determinism, restated as the adapter’s obligations: canonical clock t = frame / fps; no wall-clock dependencies; no unseeded randomness; no render-time network fetches; fixed fps/width/height; finite duration only; deterministic frame quantization before seek.
Supported runtimes
First-party adapters, each with its own seek method. All of them live in the /hyperframes-animation skill, which also carries the runtime-specific motion rules and transitions:
| Runtime | Seek method |
|---|---|
| GSAP | timeline.totalTime(timeSeconds) or timeline.seek(timeSeconds) |
| Anime.js | instance.seek(timeMs) for animations registered on window.__hfAnime |
| CSS keyframes | Browser Animation.currentTime, with a paused negative-delay fallback |
| Lottie / dotLottie | goToAndStop(timeMs, false), raw-frame setters, or player seek APIs |
| Three.js / WebGL | hf-seek events plus window.__hfThreeTime for deterministic scene rendering |
| Web Animations API | document.getAnimations() and animation.currentTime |
One composition can mix these runtimes ^[inferred — the vs-Remotion page lists all of them rendering together in BeginFrame mode; the hub article states compositions can mix them]. Community adapters are welcome — the docs’ stance is blunt: “if it can seek by frame, it belongs in HyperFrames.”
Conformance tests
Every adapter should pass five minimum tests, which double as a checklist for why the pattern produces deterministic output:
- Repeatability — seek the same frame twice, get identical output.
- Random seek — order
[90, 10, 50, 10]produces deterministic results. - Bounds — negative and overflow frame values don’t break it.
- Duration — returned duration is a finite integer.
- Cleanup — no leaked timers/listeners after
destroy.
HyperFrames vs Remotion
The comparison page is written by the HeyGen team and is partly marketing — it argues for HyperFrames’ bets. The substantive, verifiable technical differences are extracted below; the persuasion-flavored claims are labeled as such. Note also that the page opens by crediting Remotion generously: HeyGen ran Remotion in production for months, several HyperFrames patterns (Chrome launch flags, port selection, image2pipe streaming into FFmpeg, in-order frame buffering) came from Remotion, and attribution comments are kept in the source.
The one decision everything flows from
Both tools drive headless Chrome, both are deterministic, and both ship agent skills. They differ on a single choice — what the primary author writes:
- Remotion’s bet: React. Compositions are React components (TSX); a build step (webpack/bundler) is required; arbitrary HTML/CSS must be rewritten as JSX.
- HyperFrames’ bet: HTML. Compositions are HTML pages; no build step (
index.htmlplays as-is); you can paste in a landing page, a design-system component, or a CodePen demo and animate it.
The load-bearing technical difference: library-clock animations
This is the most important substantive contrast, and it’s a direct payoff of the determinism design. HeyGen gave both renderers the identical 4-second GSAP timeline — 11 letters of “HYPERFRAMES” enter staggered with a back-out ease, hold 1.5s, then rotate and fall out — same code, same easings, same stagger, only the renderer changed:
- HyperFrames: all 4 seconds are used. Letters fly in, the word holds center ~1.5s, then spins and drops away — exactly as authored.
- Remotion: GSAP plays through its entire 4-second animation in roughly the first second of render wall-clock time. By the time Remotion captures later frames, the timeline has completed and every letter has exited; the rest of the render captures an empty stage.
The mechanism (not opinion): GSAP drives its own timeline via performance.now(), which ticks at real-time speed during render. HyperFrames pauses GSAP and seeks it to frame / fps before capturing each frame, so the library runs in lockstep with the output. Remotion has no equivalent primitive, so GSAP’s internal ticker races through the timeline at wall-clock speed while Remotion captures only a handful of entrance frames. The pattern generalizes to any library with its own clock (Anime.js, Motion One); any JS library without its own clock just works in either tool. This is exactly what the frame-adapter seekFrame contract buys you.
Capture modes and auto-fallback
HyperFrames has two capture modes — relevant because only one gives the byte-for-byte guarantee:
- BeginFrame mode (Linux +
chrome-headless-shell) drives Chrome’s compositor atomically viaHeadlessExperimental.beginFrame— byte-for-byte reproducible frames across machines. - Screenshot mode (macOS, Windows, and as an automatic fallback) runs Chrome in real time and takes ordinary screenshots — the same approach Remotion uses.
The renderer inspects each composition at compile time and falls back to screenshot mode when it sees primitives BeginFrame can’t handle (inline <video>s, raw requestAnimationFrame loops outside a frame adapter), injecting a virtual-time shim so rAF and iframe content stay frame-driven. In BeginFrame mode you get determinism “for free”: GSAP timelines, CSS @keyframes (via the WAAPI adapter), Lottie, Three.js, and the Web Animations API all render deterministically; raw canvas loops and live-web embeds drop to screenshot mode automatically.
Other substantive differences
| Dimension | HyperFrames | Remotion |
|---|---|---|
| Build step | None; index.html plays as-is | Required (webpack/bundler) |
| Arbitrary HTML/CSS | Paste and animate | Rewrite as JSX |
| Distributed rendering | AWS Lambda path (newer): Step Functions + chunk workers, S3 intermediates, lambda render / render-batch | Remotion Lambda — mature, production-tested for years |
| HDR output | Supported (two-pass DOM + native HLG/PQ compositing) | Documented as unsupported (sRGB only) |
| Visual editor over render source | Native — renderer and editor share one DOM (Studio; ships for captions today) | Source is code + build step; round-trip means re-compiling |
| License | Apache 2.0 (OSI-approved; free commercial at any scale, no per-render fees) | Source-available custom license; paid above small-team thresholds, per-render fees apply |
Where Remotion wins (per the page itself)
The doc is fair about this:
- React component reuse is Remotion’s home turf — if your team already ships a React design system, you compose videos from the same typed primitives, with IDE completion and cross-file refactoring. HyperFrames explicitly doesn’t try to match this.
- Distributed rendering maturity — Remotion Lambda has years of production hardening; HyperFrames’ Lambda path is newer.
The opinion-labeled claims
These are HeyGen’s eval-based arguments, not independently verifiable here — treat as the vendor’s position:
- Claim (opinion): LLMs writing Remotion produced “less creative visual outputs” and needed more guardrails/prompting than the same LLMs writing HTML + GSAP, with output converging on a “narrow visual vocabulary.”
- Claim (opinion): agents “express visuals better in HTML than React” because HTML/CSS/JS and 25+ years of web-animation content are the deepest well in model training data, while React is a smaller slice.
The underlying mechanism behind these claims (HTML being the render layer and the editable source of truth, enabling DOM-native visual editors like Paper.Design) is architectural fact; the creativity/eval conclusions drawn from it are HeyGen’s.
Try It
- Prove determinism to yourself. Render any composition twice with
npx hyperframes render --dockerand compare the MP4s (e.g.shasum) — in Docker/BeginFrame mode they should be byte-identical. That hash equality is the whole agent/CI value proposition in one command. - Find the nondeterminism in a flaky composition. If renders differ run-to-run, grep the composition for the five banned sources:
Date.now,requestAnimationFrame,Math.random(unseeded), render-timefetch, and any unsetfps/width/height. - Pick the right capture mode. If you need cross-machine byte-for-byte output (CI), render on Linux in BeginFrame mode or use
--docker; expect automatic screenshot-mode fallback (and a diagnostic) if your composition has inline<video>or raw canvas loops. - Evaluate the Remotion switch on the real axis. If you’re choosing between the two, the deciding questions are: do you have an existing React design system (favors Remotion), and do you need library-clock animations like GSAP to be frame-accurate or Apache-2.0 licensing (favors HyperFrames)? See Remotion Motion Graphics.