Source: raw/gh-star-taoufik123-collab-claude-watch.md — Repo: https://github.com/taoufik123-collab/claude-watch (58★, MIT, Python). README read from the repo 2026-06-17.
claude-watch is a Claude skill (/watch) that lets Claude watch a video — it downloads a URL or reads a local file, extracts frames, pulls a timestamped transcript, and Reads every frame as an image, so Claude answers grounded in what was actually on screen and in the audio rather than guessing from the title. It extends the original claude-video project with two distinctive additions aimed at content/marketing analysis: scene-change frame extraction (token-efficient) and a 0–10s “hook microscope.” It also offers optional Obsidian auto-save — a watched video becomes a connected vault entry — which maps directly onto this wiki’s own Karpathy-pattern workflow.
Key Takeaways
- Watches video, doesn’t just summarize. Paste a URL (anything yt-dlp supports — YouTube, Loom, TikTok, X, Instagram) or a local
.mp4/.mov/.mkv/.webmplus a question; Claude downloads, extracts frames, transcribes, and reads the frames as images before answering. - Scene-change frame extraction (the token-efficiency lever).
scripts/frames.pygrabs one frame per detected cut via ffmpegselect=gt(scene,…)instead of a uniform every-N-seconds tick, so frame count (and thus image-token cost) is bounded by the number of shots, not the duration. - 0–10s hook microscope.
scripts/hook.pyruns a denser 2 fps pass plus a word-level Whisper transcript on the opening 10 seconds, reporting what was on screen as each word landed — built for studying why a viral video’s hook works. This is the standout feature for short-form / ad-creative analysis. - Auto-scaled frame budget. Duration-aware: ≤30s ~30 frames, 30–60s ~40, 1–3min ~60, 3–10min ~80, >10min 100 sparse (with a “re-run focused” warning). Hard ceilings 2 fps / 100 frames;
--start/--endgives a denser focused window. - Structured
report.mdwith fill markers.scripts/report.pyemits a fixed schema (TL;DR, key moments, hook breakdown, editorial profile, quotable moments, entities, concepts, transcript) where narrative sections are explicit<!-- pending Claude fill -->markers — Claude gets a job-list, not a blank doc. - Optional Obsidian auto-save. Stages the report into
$VAULT_DIR/raw/watched/<slug>/, opens it via theobsidian://URL scheme, and offers vault ingest. Skips cleanly when no vault is set; resolves$WATCH_VAULT_DIRor auto-detects~/Second brain/,~/Documents/Obsidian/,~/Obsidian/. - Free by default, multi-surface. Native captions cover most public videos at no cost; a Whisper key (Groq
whisper-large-v3preferred, or OpenAIwhisper-1) is only needed when a video lacks captions. Installs as a Claude Code plugin, a claude.ai.skill, or a Codex/generic skill.yt-dlp+ffmpegauto-install on first run.
Positioning
claude-watch is video-input/understanding, distinct from the production and editing tools in this topic:
- vs Claude Code Video Toolkit — that toolkit produces video (Remotion/LTX-2/SadTalker);
claude-watchconsumes it for analysis and summarization. - vs video-use — video-use edits-by-conversation;
claude-watchanswers questions about a video’s content and structure. - vs SentrySearch — SentrySearch does semantic search across a video archive;
claude-watchdoes deep single-video frame+audio analysis.
Related
- video-use (browser-use) — the editing-side counterpart; both make Claude the operator over video.
- Claude Code Video Toolkit — the production-side workspace;
claude-watchis the input/analysis side of the same “Claude owns video” thesis. - SentrySearch — archive-scale semantic video search vs single-video deep read.
- Karpathy Pattern — the Obsidian auto-save step turns a watched video into a vault entry, the same clip→compile workflow this wiki runs.
- AI Marketing — the hook-microscope is a competitor-creative / ad-hook analysis tool.
- AI Video & Content Production
Implementation
- Tool/Service:
claude-watch(/watchskill) — taoufik123-collab/claude-watch - Setup:
- Claude Code:
/plugin marketplace add taoufik123-collab/claude-watchthen/plugin install watch@claude-watch - claude.ai (web): download
watch.skillfrom the repo’s latest release → Settings → Capabilities → Skills - Codex / generic:
git clone https://github.com/taoufik123-collab/claude-watch.git ~/.codex/skills/watch
- Claude Code:
- Cost: Free (MIT). Captions are free; Whisper API (Groq/OpenAI) only when a video has no captions. Main cost driver is image tokens — hence the scene-change extraction.
- Integration notes:
yt-dlp+ffmpegauto-install on first run (brew on macOS; exact commands printed on Linux/Windows). Set$WATCH_VAULT_DIRto enable Obsidian auto-save. Use--start/--endfor focused passes on long videos;--resolution 1024when Claude needs to read on-screen text.
Try It
- Summarize a long video:
/watch https://youtu.be/<long-thing> summarize this— pulls structure + key moments faster than 2× playback. - Analyze a viral hook:
/watch https://youtu.be/<viral-video> what hook did they open with?— the 0–10s microscope reports on-screen + spoken content second-by-second. - Debug from a screen recording:
/watch bug-repro.mov what's going wrong?— Claude finds the frame where the issue appears. - Wire it into this vault: set
$WATCH_VAULT_DIRto the karpathy vault and let a watched video stage itself intoraw/watched/for a later compile pass.
Open Questions
- README claims 58★ at ingest; it’s a young single-maintainer skill — recheck maintenance before relying on it in production.
- Word-level Whisper on the hook requires an API key when captions are absent — quantify the per-video cost for caption-less short-form clips.