Source: raw/gh-star-taoufik123-collab-claude-watch.md — Repo: https://github.com/taoufik123-collab/claude-watch (58★, MIT, Python). README read from the repo 2026-06-17.

claude-watch is a Claude skill (/watch) that lets Claude watch a video — it downloads a URL or reads a local file, extracts frames, pulls a timestamped transcript, and Reads every frame as an image, so Claude answers grounded in what was actually on screen and in the audio rather than guessing from the title. It extends the original claude-video project with two distinctive additions aimed at content/marketing analysis: scene-change frame extraction (token-efficient) and a 0–10s “hook microscope.” It also offers optional Obsidian auto-save — a watched video becomes a connected vault entry — which maps directly onto this wiki’s own Karpathy-pattern workflow.

Key Takeaways

  • Watches video, doesn’t just summarize. Paste a URL (anything yt-dlp supports — YouTube, Loom, TikTok, X, Instagram) or a local .mp4/.mov/.mkv/.webm plus a question; Claude downloads, extracts frames, transcribes, and reads the frames as images before answering.
  • Scene-change frame extraction (the token-efficiency lever). scripts/frames.py grabs one frame per detected cut via ffmpeg select=gt(scene,…) instead of a uniform every-N-seconds tick, so frame count (and thus image-token cost) is bounded by the number of shots, not the duration.
  • 0–10s hook microscope. scripts/hook.py runs a denser 2 fps pass plus a word-level Whisper transcript on the opening 10 seconds, reporting what was on screen as each word landed — built for studying why a viral video’s hook works. This is the standout feature for short-form / ad-creative analysis.
  • Auto-scaled frame budget. Duration-aware: ≤30s ~30 frames, 30–60s ~40, 1–3min ~60, 3–10min ~80, >10min 100 sparse (with a “re-run focused” warning). Hard ceilings 2 fps / 100 frames; --start/--end gives a denser focused window.
  • Structured report.md with fill markers. scripts/report.py emits a fixed schema (TL;DR, key moments, hook breakdown, editorial profile, quotable moments, entities, concepts, transcript) where narrative sections are explicit <!-- pending Claude fill --> markers — Claude gets a job-list, not a blank doc.
  • Optional Obsidian auto-save. Stages the report into $VAULT_DIR/raw/watched/<slug>/, opens it via the obsidian:// URL scheme, and offers vault ingest. Skips cleanly when no vault is set; resolves $WATCH_VAULT_DIR or auto-detects ~/Second brain/, ~/Documents/Obsidian/, ~/Obsidian/.
  • Free by default, multi-surface. Native captions cover most public videos at no cost; a Whisper key (Groq whisper-large-v3 preferred, or OpenAI whisper-1) is only needed when a video lacks captions. Installs as a Claude Code plugin, a claude.ai .skill, or a Codex/generic skill. yt-dlp + ffmpeg auto-install on first run.

Positioning

claude-watch is video-input/understanding, distinct from the production and editing tools in this topic:

  • vs Claude Code Video Toolkit — that toolkit produces video (Remotion/LTX-2/SadTalker); claude-watch consumes it for analysis and summarization.
  • vs video-use — video-use edits-by-conversation; claude-watch answers questions about a video’s content and structure.
  • vs SentrySearch — SentrySearch does semantic search across a video archive; claude-watch does deep single-video frame+audio analysis.

Implementation

  • Tool/Service: claude-watch (/watch skill) — taoufik123-collab/claude-watch
  • Setup:
    • Claude Code: /plugin marketplace add taoufik123-collab/claude-watch then /plugin install watch@claude-watch
    • claude.ai (web): download watch.skill from the repo’s latest release → Settings → Capabilities → Skills
    • Codex / generic: git clone https://github.com/taoufik123-collab/claude-watch.git ~/.codex/skills/watch
  • Cost: Free (MIT). Captions are free; Whisper API (Groq/OpenAI) only when a video has no captions. Main cost driver is image tokens — hence the scene-change extraction.
  • Integration notes: yt-dlp + ffmpeg auto-install on first run (brew on macOS; exact commands printed on Linux/Windows). Set $WATCH_VAULT_DIR to enable Obsidian auto-save. Use --start/--end for focused passes on long videos; --resolution 1024 when Claude needs to read on-screen text.

Try It

  1. Summarize a long video: /watch https://youtu.be/<long-thing> summarize this — pulls structure + key moments faster than 2× playback.
  2. Analyze a viral hook: /watch https://youtu.be/<viral-video> what hook did they open with? — the 0–10s microscope reports on-screen + spoken content second-by-second.
  3. Debug from a screen recording: /watch bug-repro.mov what's going wrong? — Claude finds the frame where the issue appears.
  4. Wire it into this vault: set $WATCH_VAULT_DIR to the karpathy vault and let a watched video stage itself into raw/watched/ for a later compile pass.

Open Questions

  • README claims 58★ at ingest; it’s a young single-maintainer skill — recheck maintenance before relying on it in production.
  • Word-level Whisper on the hook requires an API key when captions are absent — quantify the per-video cost for caption-less short-form clips.