Source: 4K Rendering, HDR Rendering, Performance, Remove Background — HeyGen HyperFrames docs
HyperFrames renders video from HTML by screenshotting Chrome frame-by-frame and encoding with ffmpeg (see hyperframes-rendering for the base pipeline). This page covers the four advanced render knobs on top of that base: high-resolution 4K output (author-native or supersampled), HDR10 color, render/preview performance tuning, and transparent-background video (alpha matting). Each is an independent flag or command — most claims here trace directly to the four HyperFrames guides.
Key Takeaways
- 4K has two routes: scaffold native (
init --resolution 4k→ patchesdata-width="3840"/data-height="2160"etc.) or supersample an existing 1080p composition at render (render --resolution 4k), which sets ChromedeviceScaleFactor=2so each screenshot lands at 3840×2160. ~4× more pixels → ~4× slower per frame. - What supersampling fixes vs. doesn’t: vector/browser-generated content (text, SVG, CSS gradients/shadows) re-rasterizes crisply at 4K; bitmaps with fixed pixel dimensions (
<video>,<canvas>, sub-4K images) stay locked to source — author those at the target resolution. - 4K guards (render exits before work if violated): aspect ratio must match the composition’s orientation, the width ratio must be a positive integer (1080p→4K = exactly 2×; 900p→4K = 2.4× is refused), no downsampling, and 4K + HDR is not yet a supported combination (render in two passes).
- HDR outputs HDR10 (H.265 10-bit
yuv420p10le, BT.2020). Auto-detected from source metadata; only MP4 carries HDR (mov/webm/gif fall back to SDR). Requires BT.2020 PQ/HLG video or 16-bit-PNG-PQ source. Force with--hdr/--sdr. H.264 cannot encode HDR; GPU encode omits the static metadata pro delivery needs. - Performance: preview must hit each frame in <33ms@30fps or it stutters — render is unaffected (frames captured one at a time). Top culprits: stacked
backdrop-filter: blur(), shadows on many animated elements, oversized source images (decoded RGBA =w×h×4bytes; a 7000×5000 image = 140 MB regardless of file size). Diagnose with Chrome DevTools Performance tab; fall back torender --quality draftwhen preview is unavoidably slow. - Transparent video:
remove-backgroundmats a subject locally (no API key / cloud / green screen) usingu²-net_human_seg(MIT, ~168 MB ONNX) → VP9-with-alpha.webm(default), ProRes-4444.mov, or.png. The alpha plane requires-pix_fmt yuva420p+alpha_mode=1metadata or browsers silently drop it.
4K Rendering (--resolution)
Two ways to reach true 4K (3840×2160), both producing a real 4K MP4:
# Author native — composition laid out at 4K (crisp 4K-native type + assets)
npx hyperframes init my-video --resolution 4k
# Supersample at render — keep the 1080p composition, capture at 2x DPR
npx hyperframes render --resolution 4k --output my-video-4k.mp4init --resolution 4k patches every scaffolded HTML file in place: data-width="3840", data-height="2160", data-resolution="landscape-4k", #stage CSS dimensions, and the <meta viewport> tag. render --resolution 4k leaves data-width / data-height unchanged and instead sets Chrome’s deviceScaleFactor; ffmpeg auto-detects the larger screenshot dimensions and encodes at 4K.
Verify the output:
ffprobe -v error -select_streams v:0 -show_entries stream=width,height my-video-4k.mp4
# width=3840
# height=2160Resolution presets
--resolution accepts these on both init and render:
| Preset | Dimensions | Aliases |
|---|---|---|
landscape | 1920×1080 | 1080p, hd |
portrait | 1080×1920 | 1080p-portrait |
landscape-4k | 3840×2160 | 4k, uhd |
portrait-4k | 2160×3840 | 4k-portrait |
npx hyperframes render --resolution 4k # landscape 4K
npx hyperframes render --resolution portrait-4k # vertical 4K (TikTok / Reels)
npx hyperframes render --resolution 1080p # explicit 1080p (no-op on 1080p comps)How supersampling works
The composition stays at its authored dimensions; HyperFrames derives deviceScaleFactor from the output÷composition ratio and hands it to Chrome, which renders each CSS pixel as 2×2 device pixels:
| Composition | --resolution | deviceScaleFactor | Output |
|---|---|---|---|
| 1920×1080 | 4k | 2 | 3840×2160 |
| 1080×1920 | portrait-4k | 2 | 2160×3840 |
| 3840×2160 | 4k | 1 (no-op) | 3840×2160 |
What scales, what doesn’t
| Asset type | Behavior at --resolution 4k |
|---|---|
Text (HTML, SVG <text>, web fonts), SVG/vector, CSS shapes/gradients/borders/shadows | Re-rasterized at 4K — crisp at any scale |
| Images with intrinsic dimensions ≥ 4K | Full benefit |
| Images smaller than 4K | No new detail (browser upscales the bitmap — no worse than external upscaling) |
<video> elements | Locked to source resolution — encode source at target res for 4K throughout |
<canvas> (2D + WebGL) | Locked to canvas intrinsic dimensions — multiply canvas.width/canvas.height by target DPR and ctx.scale(2,2) |
Pre-extracted <video> frames (engine-injected) | Locked to extraction (source) resolution |
Rule of thumb: vector or browser-generated → supersampling helps; fixed-pixel bitmap → author at target resolution instead.
Constraints (render exits before capturing frames)
- Aspect ratio must match the composition’s orientation — a landscape composition with
--resolution portrait-4kerrors out. - Scale must be a positive integer width ratio: 1080p→4K = exactly 2×, 720p→4K = 3× (works), 900p→4K = 2.4× (refused to avoid subpixel-text aliasing).
- Downsampling is unsupported —
--resolutiononly supersamples; render native + downscale with ffmpeg separately. --hdr+ 4K is not yet supported — the HDR layered compositor works at composition dimensions; the combination is rejected. Render two passes (HDR at native res, upscale separately).
Cost
- Per-frame capture ~3–4× slower; encoding ~2–3× slower (H.264 scales sublinearly).
- Frame data-URI cache is byte-budgeted (default 1500 MB/worker,
PRODUCER_FRAME_DATA_URI_CACHE_BYTES_MB). - Output ~3–5× the 1080p file size at default CRF — pass
--video-bitrate 25M+ for predictable sizes. - A 30s 4K render is a few minutes of wall time on a modern laptop; add
--workers 4+ on a render box for parallel capture.
Studio exposes the same path via a resolution dropdown in the Renders panel (Auto, 1080p ↔/↕, 4K ↔/↕); it applies per render, not per project.
HDR Rendering (--hdr / --sdr)
HDR10 output is H.265 10-bit (yuv420p10le) in the BT.2020 color space. HyperFrames auto-detects HDR from source metadata and falls back to SDR when no HDR content is present.
Supported HDR sources:
- HDR video tagged BT.2020 primaries with either PQ (
smpte2084) or HLG (arib-std-b67) transfer functions - 16-bit PNG images with BT.2020 PQ encoding
Output format: only MP4 carries HDR. --format mov, --format webm, --format gif auto-fall-back to SDR.
Pipeline: (1) FFmpeg probes each source’s color-space metadata; (2) transfer function chosen — PQ takes precedence over HLG; (3) H.265 encode to 10-bit with HDR10 metadata + color tagging; (4) native compositing keeps HDR media at full bit depth while SDR overlays convert sRGB → BT.2020.
Control flags:
- (default) auto-detect from sources
--hdr— force HDR regardless of source content--sdr— force SDR even when HDR sources are present
Verify:
ffprobe -v error -show_streams -select_streams v:0 output.mp4 | grep -E 'codec_name|pix_fmt|color_transfer|color_primaries|color_space'
# PQ output includes: color_transfer=smpte2084 pix_fmt=yuv420p10leLimitations:
- GPU encoding omits the HDR10 static metadata (
master-display,max-cll) needed for professional delivery. - HDR image support is 16-bit PNG only.
- H.264 cannot encode HDR (the system rejects this configuration).
- Docker rendering uses CPU-based WebGL → slower DOM capture.
Performance — keeping preview smooth & diagnosing cost
Preview vs. render: render captures frames one at a time and stitches them, so slow frames only lengthen the render — you never see pauses. Preview does the same work in real time, so a 200ms frame is a 200ms freeze. “Render looks fine, preview stutters” is expected for paint-heavy compositions, not a bug.
Expensive CSS patterns (the usual sub-30fps causes)
backdrop-filter: blur()— cost scales with blurred area × radius; stacked layers multiply. Eight layers at radii 1→128px can hit ~200ms/frame over 1920×1080 on mid-tier GPUs. Keep stacks to 2–3 with tuned radii; avoidblur(64px)/blur(128px)over large areas; bake a static blur into a PNG overlay.filter: blur()/filter: drop-shadow()— same story applied to the element itself; fine small, expensive large.- Shadows on many elements —
box-shadow/text-shadowon dozens of animated elements re-rasterizes each shadowed layer every frame. - Large gradients +
mask-image— combined withbackdrop-filterforces extra compositor passes; drop one if you don’t need both.
Image sizing
Chrome decodes JPEG/PNG to raw RGBA before display, so source resolution matters more than file size:
bitmap_bytes = width × height × 4
# a 7000×5000 source = 140 MB decoded, regardless of a 2 MB or 5 MB file on disk
Resize sources to at most 2× the canvas dimensions (for a 1920×1080 canvas, 3840×2160 is already overkill):
# ImageMagick — downsize a directory of images
mogrify -path resized -resize 3840x3840\> *.jpgMeasuring with Chrome DevTools
Open the preview (npx hyperframes preview), Cmd+Option+I (mac) / Ctrl+Shift+I (Linux/Win) → Performance tab → record, play 3–5s through the jank-prone scene, stop. Expand the tallest red-flagged bars:
- Composite Layers / Paint (large) = compositor cost (backdrop-filter, shadows, large textures)
- Decode Image = first-paint image decode (rare in Chrome 131+, off-thread by default)
- Layout / Recalculate Style = layout thrashing from script
- Script = JS work (rare; check author scripts)
A composition that’s 60fps in isolation but stutters only in specific scenes is usually a composite-cost problem — check which layers become visible there.
Fallback + WebM encode speed
When preview is unavoidably slow, render and watch the output — render is still accurate:
npx hyperframes render --quality draft --output preview.mp4Transparent WebM uses CPU-heavy libvpx-vp9, so short overlay renders can spend most wall time encoding. HyperFrames defaults to -cpu-used 4; on an 8s 1280×720 15fps VP9-alpha test, explicit -cpu-used 4 cut encode from 6.3s → 2.6s vs. libvpx’s default (SSIM 0.9986 / PSNR 50.5dB). Higher values trade compression efficiency for speed:
# per-deployment env var
PRODUCER_VP9_CPU_USED=2 npx hyperframes render --format webm --output overlay.webm
# one-off local flag
npx hyperframes render --format webm --vp9-cpu-used 2 --output overlay.webmValid values are integers -8 to 8; out-of-range values are clamped before reaching FFmpeg.
Transparent Video (remove-background)
Background removal (VFX “matting”) separates a foreground subject from its background, outputting a video with an alpha channel — transparent where the background was, opaque on the subject. The built-in remove-background command runs locally — no API keys, no cloud upload, no green screen.
# verify ffmpeg/ffprobe first (npx hyperframes doctor should show both green)
npx hyperframes remove-background subject.mp4 -o transparent.webm
# first run downloads ~168 MB model weights to ~/.cache/hyperframes/background-removal/models/Then drop the output into a composition as a normal <video> — Chrome decodes the alpha plane natively:
<div class="scene">
<img src="city.jpg" class="bg" />
<video src="transparent.webm" autoplay muted loop playsinline></video>
</div>Pipeline & model
Four local stages: ffmpeg decode (raw RGB) → u²-net_human_seg inference (320×320 mask, upsampled) → alpha composite → ffmpeg encode (VP9-alpha). The model is u²-net_human_seg (MIT, ~168 MB ONNX) run via onnxruntime-node with the best available execution provider: CoreML (Apple Silicon), CUDA (NVIDIA), CPU otherwise. Output carries the exact flags browsers need for alpha — -pix_fmt yuva420p + alpha_mode=1 metadata; get them wrong and the alpha plane is silently discarded.
Output formats
| Extension | Codec | When to use | Size (4s @ 1080p) |
|---|---|---|---|
.webm (default) | VP9 with alpha | HTML5-native transparent <video> playback | ~1 MB |
.mov | ProRes 4444 with alpha | Editing round-trip (Premiere / Resolve / Final Cut) | ~50 MB |
.png | PNG with alpha | Single-image cutout (only when input is also a single image) | varies |
Quality presets (CRF) — matters most when overlaying the cutout on its own source
--quality | CRF | Size (12s @ 1080p) | When |
|---|---|---|---|
fast | 30 | ~2 MB | Cutout over an unrelated background, size-sensitive |
balanced (default) | 18 | ~6 MB | Text-behind-subject / any overlay on the source |
best | 12 | ~12 MB | Hero shots, masters, re-encode-downstream |
The encoder also writes BT.709 + limited-range color metadata so Chrome’s YUV→RGB matches the source MP4 (without it, even lossless cutouts show a red/skin shift vs. the underlying clip).
Layer separation (--background-output / -b)
Writes a second transparent video in the same inference pass — same RGB, inverse alpha (opaque where the surroundings were, transparent over the subject):
npx hyperframes remove-background subject.mp4 \
-o subject.webm \
--background-output plate.webmsubject.webm (subject opaque) is the foreground/top layer; plate.webm (255 − mask, subject region transparent) is the background layer — put anything you want occluded by the subject’s silhouette between them. Encode cost roughly doubles; segmentation cost is unchanged. This is a hole-cut plate, not an inpainted clean plate — the subject region is fully transparent, so you must composite something opaque under it. For an actual filled background where the subject was, use a video inpainter (LaMa, ProPainter, RunwayML Inpaint) — outside this CLI. The flag requires video input and .webm/.mov for both outputs (no image inputs, no .png plate).
Text-behind-subject compositing — two non-obvious rules
The recommended layout puts a headline behind a presenter so their silhouette occludes it: z=1 base mp4 (plays the whole scene), z=2 headline, z=3 cutout webm (same source, alpha around the presenter, hidden until the cut). Two rules that bite:
- Wrap the cutout
<video>in a non-timed<div>and animate the wrapper, not the video. The framework forcesopacity: 1on any element withdata-start/data-durationwhile it’s active, so a CSS/GSAP opacity tween on the<video>itself is silently overwritten. The wrapper has nodata-*attributes, so your CSS/GSAP owns it. - Both videos start at
data-start="0"and decode in sync from t=0. Don’t “late-mount” the cutout to match the cut — Chrome does a seek + decoder warm-up at mount that can land one frame off. Mounted from t=0 with the wrapper opacity-animated, both decoders stay frame-accurate.
Compositing patterns (what goes behind the cutout)
- Cutout over a different scene (most common) — clean, no doubling; any
--quality. - Cutout over its own source mp4 (text-behind-subject, talking-head overlays) — two RGB sources of the same person; doubling is barely visible at
balanced(crf 18), shows a color shift/soft edge atfast(crf 30); usebest(crf 12) for hero shots. - Cutout over different footage of the same subject — looks like two overlapping people; avoid.
Device selection
--device auto is the default and right for almost everyone. Force CPU on a GPU box (--device cpu) to keep the GPU free or debug an EP issue; opt into CUDA with HYPERFRAMES_CUDA=1 ... --device cuda plus a GPU-enabled onnxruntime-node build (the bundled build is CPU + CoreML only). npx hyperframes remove-background --info shows detected providers and what auto would pick. On CoreML-bind failure the pipeline auto-falls-back to CPU with a warning.
Performance (one-time offline preprocessing)
| Platform | Provider | ms/frame | 30s clip |
|---|---|---|---|
| Apple Silicon (M2 Pro / M3 / M4) | CoreML | ~263 | ~2 min |
| NVIDIA GPU (T4, A10, RTX) | CUDA | ~80–150 | ~30–60 s |
| Linux x86 | CPU | ~1100 | ~16 min |
| macOS Intel | CPU | ~900 | ~13 min |
Matting runs once per asset and the output is reused — slow CPU-only always works; run once on a faster machine and check the transparent file into the project if reused often.
Model fit, alternatives, troubleshooting
u²-net_human_seg is purpose-built for person/portrait matting (stable framing, contrasting background). It struggles with non-human subjects (returns a near-empty mask), very fine hair on busy backgrounds (320×320 inference softens hair tips), frame-to-frame temporal consistency (per-frame processing → possible edge flicker on moving subjects), and live/real-time capture (batch-only). For those, the docs point to free OSS tools — rembg (isnet-general-use for objects/animals/products), BiRefNet (top hair fidelity, ~4 GB RAM), Robust Video Matting (built-in temporal consistency, but GPL-3.0), Backgroundremover, ComfyUI — re-encoded back to a HyperFrames-compatible WebM with libvpx-vp9 -pix_fmt yuva420p -metadata:s:v:0 alpha_mode=1 -auto-alt-ref 0. Common fixes: a fully-opaque-looking WebM means the yuva420p + alpha_mode=1 flags were dropped on re-encode; “ffmpeg/ffprobe required” → install ffmpeg + re-run doctor; jagged alpha edges → re-frame for contrast or try birefnet-portrait.
Try It
- Supersample an existing 1080p project to 4K and verify:
npx hyperframes render --resolution 4k --output 4k.mp4 && ffprobe -v error -select_streams v:0 -show_entries stream=width,height 4k.mp4(expect 3840×2160). - If a
<video>or<canvas>looks soft at 4K, fix the asset (encode source video at target res; multiply canvas dimensions by DPR +ctx.scale) — supersampling won’t sharpen fixed-pixel bitmaps. - Mat a talking-head clip locally and drop it over a background:
npx hyperframes remove-background subject.mp4 -o transparent.webm, then add<video src="transparent.webm" autoplay muted loop playsinline>over an<img>background. - If preview stutters, open Chrome DevTools → Performance, record the jank-prone scene, and check whether “Composite Layers / Paint” dominates (cut backdrop-filter stacks / oversized images); otherwise render
--quality draftand watch the MP4. - For HDR sources, render MP4 and confirm with
ffprobe ... | grep color_transfer(expectsmpte2084for PQ) — and remember HDR + 4K must be two separate passes.
Open Questions
- The CLI reference for the exact
--video-bitrate/--crfdefaults and the fullrenderflag list lives at/packages/cli#render(not fetched here) — see hyperframes-packages / hyperframes-quickstart-cli for the CLI surface. - The 4K guide says supersampling needs an integer width ratio but doesn’t state behavior when the ratio is integer in width but the composition isn’t a standard preset size — assumed to follow the same orientation/integer guards. ^[inferred]