Source: 4K Rendering, HDR Rendering, Performance, Remove Background — HeyGen HyperFrames docs

HyperFrames renders video from HTML by screenshotting Chrome frame-by-frame and encoding with ffmpeg (see hyperframes-rendering for the base pipeline). This page covers the four advanced render knobs on top of that base: high-resolution 4K output (author-native or supersampled), HDR10 color, render/preview performance tuning, and transparent-background video (alpha matting). Each is an independent flag or command — most claims here trace directly to the four HyperFrames guides.

Key Takeaways

  • 4K has two routes: scaffold native (init --resolution 4k → patches data-width="3840" / data-height="2160" etc.) or supersample an existing 1080p composition at render (render --resolution 4k), which sets Chrome deviceScaleFactor=2 so each screenshot lands at 3840×2160. ~4× more pixels → ~4× slower per frame.
  • What supersampling fixes vs. doesn’t: vector/browser-generated content (text, SVG, CSS gradients/shadows) re-rasterizes crisply at 4K; bitmaps with fixed pixel dimensions (<video>, <canvas>, sub-4K images) stay locked to source — author those at the target resolution.
  • 4K guards (render exits before work if violated): aspect ratio must match the composition’s orientation, the width ratio must be a positive integer (1080p→4K = exactly 2×; 900p→4K = 2.4× is refused), no downsampling, and 4K + HDR is not yet a supported combination (render in two passes).
  • HDR outputs HDR10 (H.265 10-bit yuv420p10le, BT.2020). Auto-detected from source metadata; only MP4 carries HDR (mov/webm/gif fall back to SDR). Requires BT.2020 PQ/HLG video or 16-bit-PNG-PQ source. Force with --hdr / --sdr. H.264 cannot encode HDR; GPU encode omits the static metadata pro delivery needs.
  • Performance: preview must hit each frame in <33ms@30fps or it stutters — render is unaffected (frames captured one at a time). Top culprits: stacked backdrop-filter: blur(), shadows on many animated elements, oversized source images (decoded RGBA = w×h×4 bytes; a 7000×5000 image = 140 MB regardless of file size). Diagnose with Chrome DevTools Performance tab; fall back to render --quality draft when preview is unavoidably slow.
  • Transparent video: remove-background mats a subject locally (no API key / cloud / green screen) using u²-net_human_seg (MIT, ~168 MB ONNX) → VP9-with-alpha .webm (default), ProRes-4444 .mov, or .png. The alpha plane requires -pix_fmt yuva420p + alpha_mode=1 metadata or browsers silently drop it.

4K Rendering (--resolution)

Two ways to reach true 4K (3840×2160), both producing a real 4K MP4:

# Author native — composition laid out at 4K (crisp 4K-native type + assets)
npx hyperframes init my-video --resolution 4k
 
# Supersample at render — keep the 1080p composition, capture at 2x DPR
npx hyperframes render --resolution 4k --output my-video-4k.mp4

init --resolution 4k patches every scaffolded HTML file in place: data-width="3840", data-height="2160", data-resolution="landscape-4k", #stage CSS dimensions, and the <meta viewport> tag. render --resolution 4k leaves data-width / data-height unchanged and instead sets Chrome’s deviceScaleFactor; ffmpeg auto-detects the larger screenshot dimensions and encodes at 4K.

Verify the output:

ffprobe -v error -select_streams v:0 -show_entries stream=width,height my-video-4k.mp4
# width=3840
# height=2160

Resolution presets

--resolution accepts these on both init and render:

PresetDimensionsAliases
landscape1920×10801080p, hd
portrait1080×19201080p-portrait
landscape-4k3840×21604k, uhd
portrait-4k2160×38404k-portrait
npx hyperframes render --resolution 4k          # landscape 4K
npx hyperframes render --resolution portrait-4k # vertical 4K (TikTok / Reels)
npx hyperframes render --resolution 1080p       # explicit 1080p (no-op on 1080p comps)

How supersampling works

The composition stays at its authored dimensions; HyperFrames derives deviceScaleFactor from the output÷composition ratio and hands it to Chrome, which renders each CSS pixel as 2×2 device pixels:

Composition--resolutiondeviceScaleFactorOutput
1920×10804k23840×2160
1080×1920portrait-4k22160×3840
3840×21604k1 (no-op)3840×2160

What scales, what doesn’t

Asset typeBehavior at --resolution 4k
Text (HTML, SVG <text>, web fonts), SVG/vector, CSS shapes/gradients/borders/shadowsRe-rasterized at 4K — crisp at any scale
Images with intrinsic dimensions ≥ 4KFull benefit
Images smaller than 4KNo new detail (browser upscales the bitmap — no worse than external upscaling)
<video> elementsLocked to source resolution — encode source at target res for 4K throughout
<canvas> (2D + WebGL)Locked to canvas intrinsic dimensions — multiply canvas.width/canvas.height by target DPR and ctx.scale(2,2)
Pre-extracted <video> frames (engine-injected)Locked to extraction (source) resolution

Rule of thumb: vector or browser-generated → supersampling helps; fixed-pixel bitmap → author at target resolution instead.

Constraints (render exits before capturing frames)

  • Aspect ratio must match the composition’s orientation — a landscape composition with --resolution portrait-4k errors out.
  • Scale must be a positive integer width ratio: 1080p→4K = exactly 2×, 720p→4K = 3× (works), 900p→4K = 2.4× (refused to avoid subpixel-text aliasing).
  • Downsampling is unsupported--resolution only supersamples; render native + downscale with ffmpeg separately.
  • --hdr + 4K is not yet supported — the HDR layered compositor works at composition dimensions; the combination is rejected. Render two passes (HDR at native res, upscale separately).

Cost

  • Per-frame capture ~3–4× slower; encoding ~2–3× slower (H.264 scales sublinearly).
  • Frame data-URI cache is byte-budgeted (default 1500 MB/worker, PRODUCER_FRAME_DATA_URI_CACHE_BYTES_MB).
  • Output ~3–5× the 1080p file size at default CRF — pass --video-bitrate 25M+ for predictable sizes.
  • A 30s 4K render is a few minutes of wall time on a modern laptop; add --workers 4+ on a render box for parallel capture.

Studio exposes the same path via a resolution dropdown in the Renders panel (Auto, 1080p ↔/↕, 4K ↔/↕); it applies per render, not per project.


HDR Rendering (--hdr / --sdr)

HDR10 output is H.265 10-bit (yuv420p10le) in the BT.2020 color space. HyperFrames auto-detects HDR from source metadata and falls back to SDR when no HDR content is present.

Supported HDR sources:

  • HDR video tagged BT.2020 primaries with either PQ (smpte2084) or HLG (arib-std-b67) transfer functions
  • 16-bit PNG images with BT.2020 PQ encoding

Output format: only MP4 carries HDR. --format mov, --format webm, --format gif auto-fall-back to SDR.

Pipeline: (1) FFmpeg probes each source’s color-space metadata; (2) transfer function chosen — PQ takes precedence over HLG; (3) H.265 encode to 10-bit with HDR10 metadata + color tagging; (4) native compositing keeps HDR media at full bit depth while SDR overlays convert sRGB → BT.2020.

Control flags:

  • (default) auto-detect from sources
  • --hdr — force HDR regardless of source content
  • --sdr — force SDR even when HDR sources are present

Verify:

ffprobe -v error -show_streams -select_streams v:0 output.mp4 | grep -E 'codec_name|pix_fmt|color_transfer|color_primaries|color_space'
# PQ output includes: color_transfer=smpte2084  pix_fmt=yuv420p10le

Limitations:

  • GPU encoding omits the HDR10 static metadata (master-display, max-cll) needed for professional delivery.
  • HDR image support is 16-bit PNG only.
  • H.264 cannot encode HDR (the system rejects this configuration).
  • Docker rendering uses CPU-based WebGL → slower DOM capture.

Performance — keeping preview smooth & diagnosing cost

Preview vs. render: render captures frames one at a time and stitches them, so slow frames only lengthen the render — you never see pauses. Preview does the same work in real time, so a 200ms frame is a 200ms freeze. “Render looks fine, preview stutters” is expected for paint-heavy compositions, not a bug.

Expensive CSS patterns (the usual sub-30fps causes)

  • backdrop-filter: blur() — cost scales with blurred area × radius; stacked layers multiply. Eight layers at radii 1→128px can hit ~200ms/frame over 1920×1080 on mid-tier GPUs. Keep stacks to 2–3 with tuned radii; avoid blur(64px)/blur(128px) over large areas; bake a static blur into a PNG overlay.
  • filter: blur() / filter: drop-shadow() — same story applied to the element itself; fine small, expensive large.
  • Shadows on many elementsbox-shadow/text-shadow on dozens of animated elements re-rasterizes each shadowed layer every frame.
  • Large gradients + mask-image — combined with backdrop-filter forces extra compositor passes; drop one if you don’t need both.

Image sizing

Chrome decodes JPEG/PNG to raw RGBA before display, so source resolution matters more than file size:

bitmap_bytes = width × height × 4
# a 7000×5000 source = 140 MB decoded, regardless of a 2 MB or 5 MB file on disk

Resize sources to at most 2× the canvas dimensions (for a 1920×1080 canvas, 3840×2160 is already overkill):

# ImageMagick — downsize a directory of images
mogrify -path resized -resize 3840x3840\> *.jpg

Measuring with Chrome DevTools

Open the preview (npx hyperframes preview), Cmd+Option+I (mac) / Ctrl+Shift+I (Linux/Win) → Performance tab → record, play 3–5s through the jank-prone scene, stop. Expand the tallest red-flagged bars:

  • Composite Layers / Paint (large) = compositor cost (backdrop-filter, shadows, large textures)
  • Decode Image = first-paint image decode (rare in Chrome 131+, off-thread by default)
  • Layout / Recalculate Style = layout thrashing from script
  • Script = JS work (rare; check author scripts)

A composition that’s 60fps in isolation but stutters only in specific scenes is usually a composite-cost problem — check which layers become visible there.

Fallback + WebM encode speed

When preview is unavoidably slow, render and watch the output — render is still accurate:

npx hyperframes render --quality draft --output preview.mp4

Transparent WebM uses CPU-heavy libvpx-vp9, so short overlay renders can spend most wall time encoding. HyperFrames defaults to -cpu-used 4; on an 8s 1280×720 15fps VP9-alpha test, explicit -cpu-used 4 cut encode from 6.3s → 2.6s vs. libvpx’s default (SSIM 0.9986 / PSNR 50.5dB). Higher values trade compression efficiency for speed:

# per-deployment env var
PRODUCER_VP9_CPU_USED=2 npx hyperframes render --format webm --output overlay.webm
 
# one-off local flag
npx hyperframes render --format webm --vp9-cpu-used 2 --output overlay.webm

Valid values are integers -8 to 8; out-of-range values are clamped before reaching FFmpeg.


Transparent Video (remove-background)

Background removal (VFX “matting”) separates a foreground subject from its background, outputting a video with an alpha channel — transparent where the background was, opaque on the subject. The built-in remove-background command runs locally — no API keys, no cloud upload, no green screen.

# verify ffmpeg/ffprobe first (npx hyperframes doctor should show both green)
npx hyperframes remove-background subject.mp4 -o transparent.webm
# first run downloads ~168 MB model weights to ~/.cache/hyperframes/background-removal/models/

Then drop the output into a composition as a normal <video> — Chrome decodes the alpha plane natively:

<div class="scene">
  <img src="city.jpg" class="bg" />
  <video src="transparent.webm" autoplay muted loop playsinline></video>
</div>

Pipeline & model

Four local stages: ffmpeg decode (raw RGB)u²-net_human_seg inference (320×320 mask, upsampled)alpha compositeffmpeg encode (VP9-alpha). The model is u²-net_human_seg (MIT, ~168 MB ONNX) run via onnxruntime-node with the best available execution provider: CoreML (Apple Silicon), CUDA (NVIDIA), CPU otherwise. Output carries the exact flags browsers need for alpha — -pix_fmt yuva420p + alpha_mode=1 metadata; get them wrong and the alpha plane is silently discarded.

Output formats

ExtensionCodecWhen to useSize (4s @ 1080p)
.webm (default)VP9 with alphaHTML5-native transparent <video> playback~1 MB
.movProRes 4444 with alphaEditing round-trip (Premiere / Resolve / Final Cut)~50 MB
.pngPNG with alphaSingle-image cutout (only when input is also a single image)varies

Quality presets (CRF) — matters most when overlaying the cutout on its own source

--qualityCRFSize (12s @ 1080p)When
fast30~2 MBCutout over an unrelated background, size-sensitive
balanced (default)18~6 MBText-behind-subject / any overlay on the source
best12~12 MBHero shots, masters, re-encode-downstream

The encoder also writes BT.709 + limited-range color metadata so Chrome’s YUV→RGB matches the source MP4 (without it, even lossless cutouts show a red/skin shift vs. the underlying clip).

Layer separation (--background-output / -b)

Writes a second transparent video in the same inference pass — same RGB, inverse alpha (opaque where the surroundings were, transparent over the subject):

npx hyperframes remove-background subject.mp4 \
  -o subject.webm \
  --background-output plate.webm

subject.webm (subject opaque) is the foreground/top layer; plate.webm (255 − mask, subject region transparent) is the background layer — put anything you want occluded by the subject’s silhouette between them. Encode cost roughly doubles; segmentation cost is unchanged. This is a hole-cut plate, not an inpainted clean plate — the subject region is fully transparent, so you must composite something opaque under it. For an actual filled background where the subject was, use a video inpainter (LaMa, ProPainter, RunwayML Inpaint) — outside this CLI. The flag requires video input and .webm/.mov for both outputs (no image inputs, no .png plate).

Text-behind-subject compositing — two non-obvious rules

The recommended layout puts a headline behind a presenter so their silhouette occludes it: z=1 base mp4 (plays the whole scene), z=2 headline, z=3 cutout webm (same source, alpha around the presenter, hidden until the cut). Two rules that bite:

  1. Wrap the cutout <video> in a non-timed <div> and animate the wrapper, not the video. The framework forces opacity: 1 on any element with data-start/data-duration while it’s active, so a CSS/GSAP opacity tween on the <video> itself is silently overwritten. The wrapper has no data-* attributes, so your CSS/GSAP owns it.
  2. Both videos start at data-start="0" and decode in sync from t=0. Don’t “late-mount” the cutout to match the cut — Chrome does a seek + decoder warm-up at mount that can land one frame off. Mounted from t=0 with the wrapper opacity-animated, both decoders stay frame-accurate.

Compositing patterns (what goes behind the cutout)

  • Cutout over a different scene (most common) — clean, no doubling; any --quality.
  • Cutout over its own source mp4 (text-behind-subject, talking-head overlays) — two RGB sources of the same person; doubling is barely visible at balanced (crf 18), shows a color shift/soft edge at fast (crf 30); use best (crf 12) for hero shots.
  • Cutout over different footage of the same subject — looks like two overlapping people; avoid.

Device selection

--device auto is the default and right for almost everyone. Force CPU on a GPU box (--device cpu) to keep the GPU free or debug an EP issue; opt into CUDA with HYPERFRAMES_CUDA=1 ... --device cuda plus a GPU-enabled onnxruntime-node build (the bundled build is CPU + CoreML only). npx hyperframes remove-background --info shows detected providers and what auto would pick. On CoreML-bind failure the pipeline auto-falls-back to CPU with a warning.

Performance (one-time offline preprocessing)

PlatformProviderms/frame30s clip
Apple Silicon (M2 Pro / M3 / M4)CoreML~263~2 min
NVIDIA GPU (T4, A10, RTX)CUDA~80–150~30–60 s
Linux x86CPU~1100~16 min
macOS IntelCPU~900~13 min

Matting runs once per asset and the output is reused — slow CPU-only always works; run once on a faster machine and check the transparent file into the project if reused often.

Model fit, alternatives, troubleshooting

u²-net_human_seg is purpose-built for person/portrait matting (stable framing, contrasting background). It struggles with non-human subjects (returns a near-empty mask), very fine hair on busy backgrounds (320×320 inference softens hair tips), frame-to-frame temporal consistency (per-frame processing → possible edge flicker on moving subjects), and live/real-time capture (batch-only). For those, the docs point to free OSS tools — rembg (isnet-general-use for objects/animals/products), BiRefNet (top hair fidelity, ~4 GB RAM), Robust Video Matting (built-in temporal consistency, but GPL-3.0), Backgroundremover, ComfyUI — re-encoded back to a HyperFrames-compatible WebM with libvpx-vp9 -pix_fmt yuva420p -metadata:s:v:0 alpha_mode=1 -auto-alt-ref 0. Common fixes: a fully-opaque-looking WebM means the yuva420p + alpha_mode=1 flags were dropped on re-encode; “ffmpeg/ffprobe required” → install ffmpeg + re-run doctor; jagged alpha edges → re-frame for contrast or try birefnet-portrait.


Try It

  1. Supersample an existing 1080p project to 4K and verify: npx hyperframes render --resolution 4k --output 4k.mp4 && ffprobe -v error -select_streams v:0 -show_entries stream=width,height 4k.mp4 (expect 3840×2160).
  2. If a <video> or <canvas> looks soft at 4K, fix the asset (encode source video at target res; multiply canvas dimensions by DPR + ctx.scale) — supersampling won’t sharpen fixed-pixel bitmaps.
  3. Mat a talking-head clip locally and drop it over a background: npx hyperframes remove-background subject.mp4 -o transparent.webm, then add <video src="transparent.webm" autoplay muted loop playsinline> over an <img> background.
  4. If preview stutters, open Chrome DevTools → Performance, record the jank-prone scene, and check whether “Composite Layers / Paint” dominates (cut backdrop-filter stacks / oversized images); otherwise render --quality draft and watch the MP4.
  5. For HDR sources, render MP4 and confirm with ffprobe ... | grep color_transfer (expect smpte2084 for PQ) — and remember HDR + 4K must be two separate passes.

Open Questions

  • The CLI reference for the exact --video-bitrate / --crf defaults and the full render flag list lives at /packages/cli#render (not fetched here) — see hyperframes-packages / hyperframes-quickstart-cli for the CLI surface.
  • The 4K guide says supersampling needs an integer width ratio but doesn’t state behavior when the ratio is integer in width but the composition isn’t a standard preset size — assumed to follow the same orientation/integer guards. ^[inferred]