Brad Bonanno — Claude Code Can Watch Any Video Instantly

The Problem

Claude is text-only — it can’t watch videos natively. Transcripts alone miss visual context (expressions, on-screen text, UI demos, body language). Frames alone miss timestamps and sequence.

The Solution: claude-video /watch

Brad’s open-source tool bridges the gap:

Paste URL → Download → Extract frames → Transcribe → Hand all to Claude

How It Works

  1. Accepts any video URL — YouTube, local file, or any downloadable link
  2. Extracts frames at an auto-scaled rate (adapts to video length)
  3. Pulls timestamped transcript — uses free captions when available, Whisper API as fallback
  4. Delivers to Claude — frames + transcript combined so Claude can analyze both visual and audio content

What Claude Can “See”

  • Visual frames — expressions, UI screens, objects, text in video
  • Timestamped transcript — what was said, when, and in what context
  • Combined context — “at 2:34, the speaker shows X while saying Y”

Use Cases

Use CaseExample
Video Q&AAsk questions about any video content
Tutorial reviewClaude watches a coding tutorial and extracts every command
Meeting analysisUpload a recording, get minutes + key decisions
Content repurposingWatch a video → generate blog post, Twitter thread, LinkedIn post
Fact-checkingVerify claims made in videos against frames and transcript
AccessibilityGenerate summaries for long-form video content

Why It’s Different

ApproachLimitation
Transcript onlyMisses all visual information
Screenshots onlyNo audio, no sequence context
ManualTime-consuming, doesn’t scale
claude-videoBoth frames + transcript, automated

Installation

git clone https://github.com/bradautomates/claude-video
cd claude-video
# Follow setup instructions in the README

Then use within Claude Code:

/watch https://youtube.com/watch?v=...