AI Podcasts

Hands-on tests, walkthroughs, and operator-perspective takes from AI/Claude-relevant podcasts and YouTube creators. The bar here is looser than the rest of the wiki — entries can mix opinion with observation. Concrete claims (test prompts, pricing, benchmarks, specific failure modes) get tagged and surfaced; pure-opinion segments stay quoted and attributed but don’t drive routing decisions in other topics.

This topic is the home for content from the curated yt-podcast feeds (~/.birdclaw/yt-podcast-feeds.json) where the source is a video review / field test / discussion rather than a primary product announcement or technical writeup. When a podcast surfaces a substantive first-party claim, the article cross-links into the relevant topic (claude-ai for model claims, prompt-engineering for technique claims, etc.) rather than duplicating content.

What belongs here

Hands-on model comparisons by individual creators (MattVidPro, Matt Wolfe, etc.) — operator-perspective rather than benchmark-derived
Long-form podcast episodes covering AI strategy / product / industry (Hard Fork, Intelligent Machines, Last Week in AI, The AI Show)
Field tests with reproducible prompts (those prompts get hoisted into Try-It sections)
Industry takes that pair with first-party announcements as a second-observation source
Opinion segments worth quoting but not falsifying — kept attributed and time-stamped

What doesn’t belong here

Primary product announcements (those go in the relevant product topic)
Pure tutorials with no podcast/host framing (those go in the relevant technique topic)
Reviews of consumer apps the wiki doesn’t otherwise cover

Articles

Codex (MattVidPro) — Three reproducible test prompts (cinematic scene, Voxel village market with fruit NPCs, 3D water-physics rotatable globe with growable lemon trees) run side-by-side across six surfaces. GPT 5.5 won all three; Opus 4.7 produced infinite errors and felt “sluggish… trading blows and often losing to 5.5 thinking” (operator opinion, pairs with the Margin Lab degradation signal as a second observation); 3.5 Flash positioning = “3.1 Pro level intelligence at 3× previous Flash pricing.” Includes the Antigravity 2.0 vs Codex “smoke tests” differentiator note.
MattVidPro AI — ElevenLabs Music V2 Hands-On Review — Headline take: V2 is “pretty impressive” and represents a redemption arc from V1 (which couldn’t compete with Suno at August 2025 launch), narrowing the gap to Suno without closing it. The load-bearing critique: every AI music generator “gives off this vibe that it’s made on the fly because it is. It doesn’t feel like it was reasoned through, written and composed with each separate piece kind of thought through in relation to one another” — the believability ceiling is architectural / compositional, not acoustic. MattVidPro hypothesizes this is solvable with a reasoning-over-structure step before generation (same direction as GPT Image 2’s “thinking-level” framing applied to music). Multilingual mid-track switching is V2’s distinctive new capability vs Suno — the one place ElevenLabs’ speech-model heritage pays off. Operator implication: AI music is production-ready for background music in genres that hide the architectural giveaway (electronic / lo-fi / instrumental), not yet for narrative songwriting.
The AI Paradox — Dan Shipper (Every) on Why More Automation Means More Humans (Lenny’s Podcast) — The CEO of Every argues more automation has meant more humans and more work (Every doubled headcount despite being maximally AI-forward), because every agent still needs a human who cares about it. Concrete, transferable claims: the super-agent over personal-agents reversal (one company agent maintained by a forward-deployed engineer — the Shopify River / Ramp pattern); work bifurcating into a Slack-based async delegation agent + Codex/Cowork as the knowledge-work OS; the bring-your-own-tokens “SaaS apocalypse is dumb” thesis (agents increase SaaS usage); “CLIs are over” once a real GUI exists; and a homemade senior-engineer benchmark (~30/100 for most models, GPT-5.5 ~62, humans high-80s) showing models fix named issues but won’t reframe the problem — so saturating evals ≠ replacing engineers.
The New Shape of Product Work — Andrew Ambrosino (OpenAI Codex), Lenny’s Podcast — Codex product/eng lead on how AI reshapes product work (Codex: ~100% of OpenAI uses it weekly, 5M+ WAU, 6× since January). Throughline: implementation is cheap, taste/curation is the bottleneck — ~90 uncoordinated prototypes of every feature, so curation wins. “PRDs are dead” is wrong (pick the medium; a polished prototype over-anchors); AI is bad at design because it’s hard to grade and not in the AI-research flywheel; “your role is the average of what you spend time on” (calibrated role-collapse); build for the next model (the Feb Codex app would’ve failed in Nov — only the models changed); Codex as an OpenClaw-like agent OS (daily Slack-brief automation, computer use, a self-built Premiere extension); “loops are so last week → harness engineering.” Pairs with Dan Shipper’s AI Paradox (same show).
Claude Shopping Assistant (Nicole Ruiz) — Operator recipe for turning Claude into a personal shopping assistant: a claude.ai Project that vets purchases for genuine quality (durability, materials, repairability) over hype, plus a Cowork / Dispatch + Gmail automation that handles returns end-to-end (find the order email, locate the return window, draft/file the request). A concrete consumer-side use case for the Project + connector + scheduled-agent stack the wiki documents elsewhere.
How Gusto’s CTO Uses Claude Code to Ship Like a Startup — Eddie Kim (Gusto CTO & co-founder) on How I AI (Claire Vo): a team of 5 built and launched Gusto Co-founder (an AI agent product) in 10 weeks from zero code with no meetings, tech specs, Figmas, or Jira — just a 24/7 “perma-zoom.” Minimal stack: a Cloudflare Worker for the agent loop + the Vercel AI SDK (“no while loop” — the SDK owns the loop, model-switching, tools, file access); memory = a tool writing to a memory DB column; PR-as-spec behind feature flags; faked-frontend-first; eval-driven Claude Code fixes (prompt it to write a failing eval → fix → prove it passes → open a PR). Operating-model signals: designers shipping production code (94th-percentile throughput on the DX tool), 9-minute median PR reviews, the “trash-can method.” Pairs with Ambrosino’s New Shape of Product Work.
I Let Codex Control My Browser — Browser vs. Computer Use (How I AI, Claire Vo) — The distinction the wiki lacked: Codex’s @browser / @chrome / @computer trichotomy, and when each is the right reach. @chrome drives your logged-in browser (so it inherits your sessions and your blast radius), @browser runs an isolated instance, @computer leaves the browser entirely. Carries two reproducible prompt shapes — agentic QA (walk a flow, report what breaks) and persona research (browse as a specific user type and report friction). Pairs with Agent Guardrails for the permission question driving-your-real-browser raises.

Tracked feeds (from `~/.birdclaw/yt-podcast-feeds.json`, 2026-05-27)

Intelligent Machines (TWiT) — long-form weekly AI discussion
Hard Fork (NYT) — Kevin Roose + Casey Newton, weekly AI/tech
Last Week in AI — recap-heavy news roundup
The AI Show (Paul Roetzer + Mike Kaput) — practitioner-focused
MattVidPro AI — hands-on field tests + creator commentary
Matt Wolfe — AI news + tool walkthroughs

Auto-pull lands new uploads in raw/ as yt-podcast source-type with triage: pending. Compile applies the strict-but-not-skip-by-default bar (read full content before applying any skip reason — don’t skip curated-feed items on title alone).

Jonathon's AI Wiki

Explorer

AI Podcasts

What belongs here

What doesn’t belong here

Articles

Tracked feeds (from `~/.birdclaw/yt-podcast-feeds.json`, 2026-05-27)

Browser and Computer Use in Practice (Claire Vo, How I AI)

Gemini 3.5 Flash Field Test vs GPT 5.5 / Opus 4.7 / Spark / Antigravity 2.0 / Codex (MattVidPro)

How Gusto's CTO Uses Claude Code to Ship Like a Startup

The New Shape of Product Work — Andrew Ambrosino (OpenAI Codex) on Lenny's Podcast

Claude Shopping Assistant — A Project + Cowork Recipe for Buying Quality (Nicole Ruiz)

The AI Paradox — Dan Shipper on Why More Automation Means More Humans (Lenny's Podcast)

MattVidPro AI — ElevenLabs Music V2 Hands-On Review

Explorer

AI Podcasts

What belongs here

What doesn’t belong here

Articles

Tracked feeds (from ~/.birdclaw/yt-podcast-feeds.json, 2026-05-27)

Tracked feeds (from `~/.birdclaw/yt-podcast-feeds.json`, 2026-05-27)