claude-watch (/watch) — Give Claude the Ability to Watch Any Video

Source: raw/gh-star-taoufik123-collab-claude-watch.md — Repo: https://github.com/taoufik123-collab/claude-watch (58★, MIT, Python). README read from the repo 2026-06-17.

claude-watch is a Claude skill (/watch) that lets Claude watch a video — it downloads a URL or reads a local file, extracts frames, pulls a timestamped transcript, and Reads every frame as an image, so Claude answers grounded in what was actually on screen and in the audio rather than guessing from the title. It extends the original claude-video project with two distinctive additions aimed at content/marketing analysis: scene-change frame extraction (token-efficient) and a 0–10s “hook microscope.” It also offers optional Obsidian auto-save — a watched video becomes a connected vault entry — which maps directly onto this wiki’s own Karpathy-pattern workflow.

Key Takeaways

Watches video, doesn’t just summarize. Paste a URL (anything yt-dlp supports — YouTube, Loom, TikTok, X, Instagram) or a local .mp4/.mov/.mkv/.webm plus a question; Claude downloads, extracts frames, transcribes, and reads the frames as images before answering.
Scene-change frame extraction (the token-efficiency lever). scripts/frames.py grabs one frame per detected cut via ffmpeg select=gt(scene,…) instead of a uniform every-N-seconds tick, so frame count (and thus image-token cost) is bounded by the number of shots, not the duration.
0–10s hook microscope. scripts/hook.py runs a denser 2 fps pass plus a word-level Whisper transcript on the opening 10 seconds, reporting what was on screen as each word landed — built for studying why a viral video’s hook works. This is the standout feature for short-form / ad-creative analysis.
Auto-scaled frame budget. Duration-aware: ≤30s ~30 frames, 30–60s ~40, 1–3min ~60, 3–10min ~80, >10min 100 sparse (with a “re-run focused” warning). Hard ceilings 2 fps / 100 frames; --start/--end gives a denser focused window.
Structured report.md with fill markers. scripts/report.py emits a fixed schema (TL;DR, key moments, hook breakdown, editorial profile, quotable moments, entities, concepts, transcript) where narrative sections are explicit  markers — Claude gets a job-list, not a blank doc.
Optional Obsidian auto-save. Stages the report into $VAULT_DIR/raw/watched/<slug>/, opens it via the obsidian:// URL scheme, and offers vault ingest. Skips cleanly when no vault is set; resolves $WATCH_VAULT_DIR or auto-detects ~/Second brain/, ~/Documents/Obsidian/, ~/Obsidian/.
Free by default, multi-surface. Native captions cover most public videos at no cost; a Whisper key (Groq whisper-large-v3 preferred, or OpenAI whisper-1) is only needed when a video lacks captions. Installs as a Claude Code plugin, a claude.ai .skill, or a Codex/generic skill. yt-dlp + ffmpeg auto-install on first run.

Positioning

claude-watch is video-input/understanding, distinct from the production and editing tools in this topic:

vs Claude Code Video Toolkit — that toolkit produces video (Remotion/LTX-2/SadTalker); claude-watch consumes it for analysis and summarization.
vs video-use — video-use edits-by-conversation; claude-watch answers questions about a video’s content and structure.
vs SentrySearch — SentrySearch does semantic search across a video archive; claude-watch does deep single-video frame+audio analysis.

video-use (browser-use) — the editing-side counterpart; both make Claude the operator over video.
Claude Code Video Toolkit — the production-side workspace; claude-watch is the input/analysis side of the same “Claude owns video” thesis.
SentrySearch — archive-scale semantic video search vs single-video deep read.
Karpathy Pattern — the Obsidian auto-save step turns a watched video into a vault entry, the same clip→compile workflow this wiki runs.
AI Marketing — the hook-microscope is a competitor-creative / ad-hook analysis tool.
AI Video & Content Production

Implementation

Tool/Service: claude-watch (/watch skill) — taoufik123-collab/claude-watch
Setup:
- Claude Code: /plugin marketplace add taoufik123-collab/claude-watch then /plugin install watch@claude-watch
- claude.ai (web): download watch.skill from the repo’s latest release → Settings → Capabilities → Skills
- Codex / generic: git clone https://github.com/taoufik123-collab/claude-watch.git ~/.codex/skills/watch
Cost: Free (MIT). Captions are free; Whisper API (Groq/OpenAI) only when a video has no captions. Main cost driver is image tokens — hence the scene-change extraction.
Integration notes: yt-dlp + ffmpeg auto-install on first run (brew on macOS; exact commands printed on Linux/Windows). Set $WATCH_VAULT_DIR to enable Obsidian auto-save. Use --start/--end for focused passes on long videos; --resolution 1024 when Claude needs to read on-screen text.

Try It

Summarize a long video: /watch https://youtu.be/<long-thing> summarize this — pulls structure + key moments faster than 2× playback.
Analyze a viral hook: /watch https://youtu.be/<viral-video> what hook did they open with? — the 0–10s microscope reports on-screen + spoken content second-by-second.
Debug from a screen recording: /watch bug-repro.mov what's going wrong? — Claude finds the frame where the issue appears.
Wire it into this vault: set $WATCH_VAULT_DIR to the karpathy vault and let a watched video stage itself into raw/watched/ for a later compile pass.

Open Questions

README claims 58★ at ingest; it’s a young single-maintainer skill — recheck maintenance before relying on it in production.
Word-level Whisper on the hook requires an API key when captions are absent — quantify the per-video cost for caption-less short-form clips.

Jonathon's AI Wiki

Explorer

claude-watch (/watch) — Give Claude the Ability to Watch Any Video

Key Takeaways

Positioning

Implementation

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

claude-watch (/watch) — Give Claude the Ability to Watch Any Video

Key Takeaways

Positioning

Related

Implementation

Try It

Open Questions

Graph View

Table of Contents

Backlinks