Source: Bradautomates Claude Video 2026 04 29

bradautomates/claude-video is a Claude skill that closes the gap between Claude’s reading/synthesis capabilities and video input. It registers a /watch command that accepts a video URL or local file path, downloads via yt-dlp (300+ sources — YouTube, TikTok, Vimeo, Instagram, X), extracts duration-aware frame samples with ffmpeg, and transcribes using either native captions (free) or a Whisper API fallback (Groq preferred, OpenAI as alternate). MIT, 117 stars / 35 forks, latest release v0.1.2 (April 24, 2026). Multi-runtime: ships as a Claude Code plugin, a .skill upload for claude.ai, a Codex skill, and a manual clone for any agent that reads ~/.claude/skills/.

Key Takeaways

  • Solves the “Claude can’t watch videos” problem. Without this skill, Claude has no native ability to ingest video. With it installed, /watch <url-or-path> returns timestamped frames + transcript ready for multimodal reasoning.
  • Adaptive frame budget. ≤30 s videos get ~30 frames; 1–3 min get ~60; 3–10 min get ~80; >10 min triggers a sparse-scan warning recommending focused re-runs over a time range. Hard caps at 2 fps minimum and 100 frames maximum keep token spend predictable.
  • Dual transcription path. Pulls native captions first (free, requires no API key). Falls back to Whisper via Groq (preferred) or OpenAI when no captions exist. --no-whisper disables audio entirely; --whisper groq|openai forces backend.
  • Multi-runtime install. Same repo ships four ways: /plugin marketplace add bradautomates/claude-video for Claude Code, watch.skill download for claude.ai web, git clone to ~/.codex/skills/watch for Codex, manual clone to ~/.claude/skills/watch for anything else. Same commands/ + SKILL.md pattern referenced in the project’s skill bundle build notes.
  • Stdlib-only API clients. whisper.py talks to Groq and OpenAI without third-party SDKs — no pip install openai etc. required beyond yt-dlp + ffmpeg.
  • 300+ sources via yt-dlp. YouTube is the headline case but TikTok, Vimeo, Instagram, X, and the rest of the yt-dlp supported-sites list work the same way. No private/authenticated-platform support.
  • Time-range slicing. --start MM:SS --end MM:SS clips the analysis window — the canonical workaround for >10-minute videos and the way to keep token budgets manageable on long-form content.
  • Optimal under 10 minutes. Beyond that, the skill warns that frame coverage gets sparse. Practical pattern: identify the section of interest with a transcript-only pass, then --start/--end for visual analysis.

Implementation

Tool/Service: bradautomates/claude-video (MIT)

Setup (Claude Code):

/plugin marketplace add bradautomates/claude-video
/plugin install watch@claude-video

Setup (claude.ai web): Download watch.skill from the GitHub releases page, then in claude.ai go to Settings → Capabilities → Skills and upload it.

Setup (Codex CLI):

git clone https://github.com/bradautomates/claude-video ~/.codex/skills/watch

Setup (manual / any agent that reads ~/.claude/skills/):

git clone https://github.com/bradautomates/claude-video ~/.claude/skills/watch

Required dependencies: ffmpeg, yt-dlp. The skill includes a setup.py preflight that auto-installs on macOS via brew; on Linux it prints apt / dnf / pipx guidance; on Windows it prints winget / pip guidance.

Optional dependencies: Groq API key (preferred for Whisper) or OpenAI API key. Without either, the skill uses native captions only — most YouTube videos have them, so this is often fine.

Cost: Free (MIT). Whisper transcription billed per Groq/OpenAI pricing; native captions free.

Configuration flags:

  • --max-frames N — cap the frame budget for token control
  • --resolution W — bump to 1024px for text-heavy videos (slides, screen recordings)
  • --fps F — override auto-fps (2 fps minimum)
  • --whisper groq|openai — pick transcription backend
  • --no-whisper — captions only, skip audio transcription
  • --out-dir DIR — control where downloads + frames land
  • --start MM:SS --end MM:SS — time-range slicing for long content

Integration notes:

  • Claude Code’s auto-triggered slash command surface picks up /watch once the plugin is installed. Pair with scheduled tasks to auto-summarize incoming videos on a cron.
  • For video-side research workflows (AI Video & Content Production), this is the input-side counterpart to production-side tools like HeyGen Studio Automation and HeyGen Hyperframes — claude-video reads source material in, those write polished output back out.
  • For competitor research workflows, /watch <competitor-video> identify hooks and creative patterns extracts structure from ad creative and content pieces. Pairs naturally with the last30days-skill research-aggregator surface.
  • Whisper has a 25 MB upload limit (~50 minutes mono 16 kHz audio). For long talks, transcribe in chunks via --start/--end or rely on native captions.

Try It

  • Install via /plugin marketplace add bradautomates/claude-video then /plugin install watch@claude-video. Verify the preflight check passes (ffmpeg + yt-dlp on $PATH).
  • Run /watch https://youtu.be/<id> what is the hook in the first 10 seconds? against a recent ad to learn the structure analysis pattern.
  • For a screen recording bug report: /watch ~/screen-recording.mp4 when does the UI break and what's on screen at that moment? — the timestamped frames + transcript narrows down the failure.
  • For long-form content: do a transcript-only pass (--no-whisper + native captions or Whisper-only with --max-frames 0 if supported, otherwise lowest frame count), identify the segment of interest, then re-run with --start MM:SS --end MM:SS for visual analysis.
  • Pair with yt-dlp’s nightly channel if you analyze TikTok/Instagram regularly — those platforms change frequently and the stable yt-dlp release can lag.