claude-video — /watch Skill for Video Input

Source: Bradautomates Claude Video 2026 04 29

bradautomates/claude-video is a Claude skill that closes the gap between Claude’s reading/synthesis capabilities and video input. It registers a /watch command that accepts a video URL or local file path, downloads via yt-dlp (300+ sources — YouTube, TikTok, Vimeo, Instagram, X), extracts duration-aware frame samples with ffmpeg, and transcribes using either native captions (free) or a Whisper API fallback (Groq preferred, OpenAI as alternate). MIT, 117 stars / 35 forks, latest release v0.1.2 (April 24, 2026). Multi-runtime: ships as a Claude Code plugin, a .skill upload for claude.ai, a Codex skill, and a manual clone for any agent that reads ~/.claude/skills/.

Key Takeaways

Solves the “Claude can’t watch videos” problem. Without this skill, Claude has no native ability to ingest video. With it installed, /watch <url-or-path> returns timestamped frames + transcript ready for multimodal reasoning.
Adaptive frame budget. ≤30 s videos get ~30 frames; 1–3 min get ~60; 3–10 min get ~80; >10 min triggers a sparse-scan warning recommending focused re-runs over a time range. Hard caps at 2 fps minimum and 100 frames maximum keep token spend predictable.
Dual transcription path. Pulls native captions first (free, requires no API key). Falls back to Whisper via Groq (preferred) or OpenAI when no captions exist. --no-whisper disables audio entirely; --whisper groq|openai forces backend.
Multi-runtime install. Same repo ships four ways: /plugin marketplace add bradautomates/claude-video for Claude Code, watch.skill download for claude.ai web, git clone to ~/.codex/skills/watch for Codex, manual clone to ~/.claude/skills/watch for anything else. Same commands/ + SKILL.md pattern referenced in the project’s skill bundle build notes.
Stdlib-only API clients. whisper.py talks to Groq and OpenAI without third-party SDKs — no pip install openai etc. required beyond yt-dlp + ffmpeg.
300+ sources via yt-dlp. YouTube is the headline case but TikTok, Vimeo, Instagram, X, and the rest of the yt-dlp supported-sites list work the same way. No private/authenticated-platform support.
Time-range slicing. --start MM:SS --end MM:SS clips the analysis window — the canonical workaround for >10-minute videos and the way to keep token budgets manageable on long-form content.
Optimal under 10 minutes. Beyond that, the skill warns that frame coverage gets sparse. Practical pattern: identify the section of interest with a transcript-only pass, then --start/--end for visual analysis.

Implementation

Tool/Service: bradautomates/claude-video (MIT)

Setup (Claude Code):

/plugin marketplace add bradautomates/claude-video
/plugin install watch@claude-video

Setup (claude.ai web): Download watch.skill from the GitHub releases page, then in claude.ai go to Settings → Capabilities → Skills and upload it.

Setup (Codex CLI):

git clone https://github.com/bradautomates/claude-video ~/.codex/skills/watch

Setup (manual / any agent that reads ~/.claude/skills/):

git clone https://github.com/bradautomates/claude-video ~/.claude/skills/watch

Required dependencies: ffmpeg, yt-dlp. The skill includes a setup.py preflight that auto-installs on macOS via brew; on Linux it prints apt / dnf / pipx guidance; on Windows it prints winget / pip guidance.

Optional dependencies: Groq API key (preferred for Whisper) or OpenAI API key. Without either, the skill uses native captions only — most YouTube videos have them, so this is often fine.

Cost: Free (MIT). Whisper transcription billed per Groq/OpenAI pricing; native captions free.

Configuration flags:

--max-frames N — cap the frame budget for token control
--resolution W — bump to 1024px for text-heavy videos (slides, screen recordings)
--fps F — override auto-fps (2 fps minimum)
--whisper groq|openai — pick transcription backend
--no-whisper — captions only, skip audio transcription
--out-dir DIR — control where downloads + frames land
--start MM:SS --end MM:SS — time-range slicing for long content

Integration notes:

Claude Code’s auto-triggered slash command surface picks up /watch once the plugin is installed. Pair with scheduled tasks to auto-summarize incoming videos on a cron.
For video-side research workflows (AI Video & Content Production), this is the input-side counterpart to production-side tools like HeyGen Studio Automation and HeyGen Hyperframes — claude-video reads source material in, those write polished output back out.
For competitor research workflows, /watch <competitor-video> identify hooks and creative patterns extracts structure from ad creative and content pieces. Pairs naturally with the last30days-skill research-aggregator surface.
Whisper has a 25 MB upload limit (~50 minutes mono 16 kHz audio). For long talks, transcribe in chunks via --start/--end or rely on native captions.

Try It

Install via /plugin marketplace add bradautomates/claude-video then /plugin install watch@claude-video. Verify the preflight check passes (ffmpeg + yt-dlp on $PATH).
Run /watch https://youtu.be/<id> what is the hook in the first 10 seconds? against a recent ad to learn the structure analysis pattern.
For a screen recording bug report: /watch ~/screen-recording.mp4 when does the UI break and what's on screen at that moment? — the timestamped frames + transcript narrows down the failure.
For long-form content: do a transcript-only pass (--no-whisper + native captions or Whisper-only with --max-frames 0 if supported, otherwise lowest frame count), identify the segment of interest, then re-run with --start MM:SS --end MM:SS for visual analysis.
Pair with yt-dlp’s nightly channel if you analyze TikTok/Instagram regularly — those platforms change frequently and the stable yt-dlp release can lag.

yt-dlp — required dependency; the underlying CLI that downloads from 300+ video platforms
Claude Code Plugins and Marketplaces — the distribution surface for /plugin marketplace add bradautomates/claude-video
Agent Skills Overview — the open standard claude-video ships against (multi-runtime install pattern)
last30days-skill — sibling research skill that also depends on yt-dlp; complementary surface for video-source research
AI Video & Content Production — adjacent topic for production-side video tools (HeyGen, Higgsfield, Remotion)
HeyGen Hyperframes — production-side counterpart (writes video out; claude-video reads video in)
Claude Code Scheduled Tasks — natural pairing for batch / cron-triggered video summarization

Jonathon's AI Wiki

Explorer

claude-video — /watch Skill for Video Input

Key Takeaways

Implementation

Try It

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

claude-video — /watch Skill for Video Input

Key Takeaways

Implementation

Try It

Related

Graph View

Table of Contents

Backlinks