Source: raw/Stanford_s_Method_Turns_Claude_Into_a_PHD_Level_Research_Team.md URL: https://www.youtube.com/watch?v=Tj3018n5MVg | Platform: YouTube
A YouTube creator repackages Stanford’s STORM research method into a free Claude Code skill that turns one topic into a verified, self-contained HTML research briefing. The skill simulates five expert “lenses” (practitioner, academic, skeptic, economist, historian), maps where they contradict each other, synthesizes a single report, then adversarially peer-reviews its own output and verifies every citation against its primary source before delivering. The creator’s pitch: a single research prompt has blind spots, so borrowing several expert perspectives produces more holistic research than one angle can.
Key Takeaways
- STORM is a Stanford research method. The creator says it “has actually been shown in peer-reviewed testing to produce articles 25% more organized than the next best method” — this is a claim stated in the video, not independently verified here.
- The core idea is multi-perspective research. Firing off one prompt to Claude leaves “a bunch of blind spots in that research plan.” STORM instead runs several angles so “each angle finds a hole that the other angles miss.”
- Five expert lenses. Practitioner, academic, skeptic, economist, historian — each subagent role-plays its own background and area of expertise.
- The skill’s own one-line description (from its
skill.md): it “turns one topic into a verified multi-perspective HTML briefing… simulates five expert lenses on the topic, maps where they contradict each other, synthesizes everything into a single self-contained HTML report, then adversarially peer reviews its own outputs, and verifies every citation against its primary source before delivering.” - Output is a consistent HTML briefing built from the five perspectives, with a verification ledger at the bottom showing sources confirmed, corrected, or demoted (the “V2” verified pass).
- Two free resources: the
skill.mdand areport-template.html(referenced by the skill for consistent formatting). The creator distributes both via a free Skool community — see Open Questions on the exact link. - Theory over tool. The takeaway the creator stresses: “if you don’t have subject matter expertise, see if you can borrow it” — use agents to create a council of little experts that kill your blind spots.
The Five Perspectives (STORM Lenses)
- Practitioner, academic, skeptic, economist, historian — run in parallel as subagents.
- Each receives its own prompt from the main session and does its own web research with tools.
- The point is disagreement: the pipeline surfaces where the lenses contradict each other, then resolves it.
- The creator notes a “missing sixth lens” the skill flagged in one run: all five lenses “look at the firm from the owner’s chair” (adoption rates, productivity, ROI) — none sat in the seat of the customer or the frontline employee. You can then spin up that sixth lens and rerun a V3.
- Lenses are editable. The creator suggests adding your own (e.g. “a beginner in AI” or “a content creator”) to fit your workflow.
How the Skill Runs (Pipeline)
The skill packages what was originally four chained prompts (spin up the five angles → contradiction map → synthesis → peer review) into one invocation:
- Phase 0 — Scope the topic. If the topic isn’t specific enough, it asks a few clarifying questions before kicking off (in the demo it inferred the reader as “an AI educator deciding whether voice AI agents are worth a video or just hype”).
- Spin up five expert lenses in parallel (subagents browsing the web).
- Map the contradictions (“contradiction map”): where do the perspectives disagree, which has strong vs weak evidence — the lenses analyze each other’s outputs.
- Synthesize everything into one HTML report (it reads the
report-template.htmlfor consistent layout). - Adversarial peer review + verification. The creator says it then “run[s] six more agents, which are going to verify all those facts” and check citations — yielding ~12 agents total (5 lenses + ~6 verifiers). The verified result is labeled V2.
The HTML Briefing Output
- 60-second summary at the top.
- Key findings ranked by reliability (e.g. “reliability high, nine out of 10”), each annotated with which lenses supported it and which challenged it (e.g. “supported by the academic and the skeptic… challenged by the practitioner and the economist”).
- The core assumption the briefing rests on, called out explicitly.
- The missing sixth lens, called out so you can extend the analysis.
- Practical takeaways tailored to the reader.
- Verification ledger at the very bottom: every source marked confirmed, corrected, or demoted. The creator’s framing: the first pass “would have had information in here that just wasn’t correct,” but because verification is built in, the V2 output earns “a lot more faith.”
STORM vs Claude Code’s Native Deep Research
The video contrasts the skill with Claude Code’s built-in deep-research feature that shipped with dynamic workflows:
- Native deep research spins up a dynamic workflow that “kick[s] off hundreds of agents” (103 in his example). It didn’t surface a report directly — he had to ask “where’s the report” and got a markdown file he calls “decent, but… really not that thorough,” with few sources (two confirmed, a few unconfirmed, plus open questions).
- Same prompt into the STORM skill produced the verified HTML briefing instead.
- Cross-model judging (attributed): the creator pasted both outputs into Codex (a different model) and asked which was better; Codex rated the STORM HTML briefing stronger on evidence quality, source diversity, thesis strength, actionability, risk control, and fit for video/content.
- Cost/speed (attributed, hedged): he says STORM was faster and “100% cheaper” because it ran ~12 agents vs 100+ for deep research — while admitting “I don’t know the exact metrics here on cost,” and noting the deep-research run got hit by API rate limits (a risk of spinning up that many agents at once). STORM “is always going to be your five personas.”
Subagents vs Agent Teams (Distinction the Video Draws)
- Subagents (what STORM uses): one main session; the subagents work for it. The main session talks to the five lenses, but the five cannot talk to each other.
- Agent teams: agents can talk to the main session and to each other — they can debate until they reach consensus. The creator calls this a “council” of agents. Agent teams are “much more expensive than subagents.”
Implementation
- Tool/Service: A free “storm research” Claude skill plus a
report-template.html, authored by the video’s creator. Built for Claude Code / Claude Desktop; portable to other coding agents. - Setup: Give Claude the two files and say “this is a skill called storm research. Put this in the
.claudefolder.” Then invoke it conversationally — e.g. “please run a storm research for me on [topic]” — no slash command required; it still invokes the skill, reads the whole thing, and runs hands-off. - Cost: Free to obtain. Runtime cost is the token cost of ~12 subagents (5 lenses + ~6 verifiers). The demo’s subagents ran on Opus 4.8 by default; the creator notes you can switch them to Haiku or Sonnet to cut cost.
- Integration notes: Works in Codex or other agents too — in Claude it must live in
.claude/; for Codex use a.codexor.agentsfolder. Tailor the skill with your business context (“here’s what I’m doing… make it tailored towards us”) so every report ends with what you should do differently. A skill, the creator explains, “is basically just a prompt… a master prompt” the agent reads and runs on invocation.
Try It
- Grab the
skill.mdandreport-template.htmlfrom the creator’s free Skool community (joined via the link in the video description — see Open Questions). - Drop both into your project’s
.claude/folder and confirm an invocation like “run a storm research for me on [topic]” triggers the skill. - Run it first on a topic you already know well and that matters to your business, then read critically for gaps to tune.
- Add a custom lens relevant to you (e.g. beginner, content creator) and rerun for a V3.
- Run the same prompt through Claude Code’s native deep research and compare thoroughness, source count, and verification.
Related
- Subagents — the primitive STORM is built on (one main session, isolated parallel workers that don’t talk to each other).
- Claude Agent Hierarchy — the subagents vs agent-teams vs managed-agents distinction the video draws.
- Council — Multi-Model Blind Deliberation — the “council of agents that debate” pattern the creator contrasts with subagents.
- Dynamic Workflows (Claude Code) — the mechanism behind the native deep-research feature STORM is benchmarked against.
- Building Agents with Skills — how Claude skills work as packaged, reusable prompts.
- The Verification Frontier — why the built-in citation-verification / confirmed-corrected-demoted step is the load-bearing part.
- AutoResearch (Thu Vu Walkthrough) — adjacent multi-agent, verify-driven research loop.
- AutoResearch Cold Outbound — applied multi-agent research with a scoring/verification loop.
Open Questions
- Skill download link not in the transcript. The creator only says “link in the description” and points to a free Skool community (Classroom → “All YouTube Resources”); no explicit URL is stated, so none is recorded here. The creator’s name and community URL are likewise not given in the transcript.
- STORM provenance unstated. The transcript doesn’t define the STORM acronym or cite the specific peer-reviewed paper behind the “25% more organized than the next best method” claim — treat that figure as an unverified creator claim.
- Exact cost unknown. The creator explicitly says “I don’t know the exact metrics here on cost”; the ~12-vs-100+ agent comparison is his estimate, and the deep-research run was confounded by API rate limits.
- Verifier count. Whether the “six verification agents” is fixed or varies per run is not specified.