Source: raw/Higgsfield_MCP_+Claude_Code=_AI_Ad_Agency_Full_Tutorial.md — Mike Futia / SCALE AI YouTube tutorial, 2026-05-01 (https://www.youtube.com/watch?v=1dga9Qxx_co), ~18 min
End-to-end demo of running a DTC ad agency campaign in a single Claude Code conversation, using the Higgsfield MCP for image + video generation and the Firecrawl MCP for brand research. One product URL → brand brief → hero static ad → copy overlay → animated hero clip → UGC creator → two UGC video clips, with Claude grading every output against the brand brief. The point isn’t any single generation — it’s that Claude is the orchestration layer holding the campaign together.
Key Takeaways
- Single conversation, full campaign. The whole workflow — research, image generation, video generation, evaluation — lives in one Claude Code session. No tab-hopping between Firecrawl, Higgsfield’s web app, and the file system.
- Firecrawl + Higgsfield is the magic combo. Firecrawl MCP scrapes the brand’s site into a brand brief; Higgsfield MCP turns that brief into image and video assets. Claude is the connective tissue.
- Claude as critic, not just generator. Every generation step ends with Claude scoring outputs against the brand brief and recommending the strongest variation, with reasoning. This is the real “Claude as creative director” pattern.
- GPT Image 2.0 used for stills, Seedance 2.0 for video. Both reached through the Higgsfield MCP — same connector, different models — accessible by name in the prompt (“use GPT Image 2.0”, “use Seedance 2.0”).
- Reference image + brand brief is the prompt template. Every generation prompt cites both: a reference image in the project folder and the brand brief Claude wrote earlier in the conversation.
- Outputs land in your project folder automatically. Higgsfield MCP downloads completed assets into the working directory, so the project folder doubles as the asset library — Claude reads them back to evaluate.
- Costs are visible. The creator is on the Higgsfield Creator plan; image generations cost ~4 credits each, Seedance 2.0 1080p clips cost ~45 credits each. Multi-variation prompts are credit multipliers.
- Two assists not bundled with Higgsfield. Firecrawl MCP for brand scraping and Mike’s own [[claude-ai/claude-vision-video-analyzer|claude-vision
/video-analyzerskill]] (Gemini-backed) for video evaluation. Both are separate setups the tutorial assumes are already wired. - Visible artifacts: project folder + Higgsfield assets tab. Every generation also lands in the user’s Higgsfield account under “all assets” with the original prompt preserved — the conversational layer doesn’t hide history.
The seven-step workflow
The tutorial walks through one conversation in this order. Each step is a single Claude prompt; Claude handles the MCP calls, file IO, and grading.
1. Wire the MCP
Settings → Connectors → Add custom connector → name “Higgsfield” → paste connector URL → click Connect → sign in to Higgsfield → connection appears under Connectors. Identical to the setup described in the Higgsfield MCP overview — no API keys.
2. Build the brand brief (Firecrawl MCP)
“Hey, can you research [brand].com, pull the brand voice, hero products, visual style, and target customer, and then build a brand brief that I can reference for the rest of this campaign.”
Claude calls Firecrawl, scrapes the site, and writes a structured brief — brand snapshot, voice, hero products table, pricing posture, visual style (logo, color palette), target customer, differentiators, campaign-ready hooks. The brief becomes the reference document every later prompt cites.
3. Hero static ad — 4 variations (GPT Image 2.0)
“Generate a hero static ad for the [Hoodie] using GPT Image 2.0 with the Higgsfield MCP. Reference image: source-assets/images/hoodie.jpg. Pull the brand brief for visual style, mood, and creative direction. Match the brand’s clean lifestyle aesthetic, feature the hoodie as the clear hero, use 9:16, no on-screen text. Generate four variations so we can pick the strongest. After generation, evaluate each one against the brand brief and recommend the strongest variation with reasoning.”
Higgsfield returns four images (model in different settings — bedroom, armchair, outdoors, studio editorial). Claude downloads all four, grades each against the brief, and recommends V3 (the outdoor shot) with explicit “for a paid hero static destined for reels or stories” reasoning.
What "evaluate against the brand brief" actually means
Claude reads each image back from disk after download, then writes a scorecard — fidelity to brief, aesthetic match, hero clarity, mood — and picks a winner with one paragraph of justification. The grading is part of the same prompt; no second turn needed.
4. Copy overlay (GPT Image 2.0, second pass)
“Take V3, the one you recommend, use GPT Image 2.0 to add a copy overlay. Same image, just with text layered in. [Specify headline + subhead + placement.]”
Claude generates two text-overlay variants. The tutorial flags that on-image typography is still rough — small fonts, top-line text degrades — and suggests iterating. Honest finding: GPT Image 2.0 + Higgsfield is good at scenes, weaker at typographic precision.
5. Animate the hero (Seedance 2.0)
“Animate the hero static using Seedance 2.0 with Higgsfield. Reference image: V3. Generate a 5-second cinematic motion shot — the opening hero clip of a premium DTC reel ad. Slow, subtle camera push in. Gentle pastel sky drift in the background. Hoodie fabric catches natural light as the camera moves. Model holds the pose with minimal motion. Save the output to the folder. Once complete, evaluate the clip against the brand brief.”
Seedance 2.0 returns a 5-second 1080p clip with auto-added background music^[the tutorial notes Seedance added music without being asked]. Claude downloads the file and evaluates it via the creator’s own [[claude-ai/claude-vision-video-analyzer|claude-vision /video-analyzer skill]] (a Gemini-API-backed Claude Code skill, MIT-licensed) — Claude can grade the still-to-motion fidelity but not the audio.
6. UGC creator generation (GPT Image 2.0)
“Generate a UGC creator who’s wearing the [hoodie] using GPT Image 2.0. Pull the brand brief to come up with the primary persona. It should feel authentically native to TikTok and Reels. [Add creative direction.] Generate two variations and review the output to pick the strongest one.”
Two outputs: brunette on a couch, blonde in a kitchen. Claude picks creator B (the blonde) for tighter persona match and “the perfect before-state for the hoodie reveal.” The tutorial emphasizes how realistic both creators look — visual quality is no longer the bottleneck.
7. UGC video clips (Seedance 2.0, two clips)
“Lock in creator B. Generate two UGC video clips using Seedance 2.0 with Higgsfield. Reference images: [creator B] + [hoodie]. Use the brand brief — voice should be warm, soft-spoken, self-care coded. Clip 1: native testimonial, sitting on couch, wearing hoodie. Clip 2: product showcase, pulling the hoodie over current outfit.”
Both clips render. Clip 1 (sitting testimonial) lands cleanly with synthesized voiceover (“This hoodie feels like a weighted hug…”). Clip 2 (pulling hoodie over outfit) shows a Seedance limitation — the hoodie “appears” rather than properly pulling on, an obvious artifact. Character consistency between the two clips is strong; background is preserved. Claude evaluates both clips visually but cannot grade the synthesized voice quality.
Pattern: Claude as orchestration layer
The repeating shape across all seven steps:
- Claude pulls context — brand brief from earlier in the conversation, reference image from disk.
- Claude calls the MCP — Higgsfield handles the actual generation; Claude picks model + parameters.
- Claude downloads outputs — into the project folder so the next step can read them.
- Claude evaluates — scores against the brand brief, recommends a winner with reasoning.
- Claude hands the next prompt back to the human — who picks up from the recommendation.
The creator’s framing: “Claude as the orchestration layer who’s not only sending off all of the prompts, but reviewing the output against the brand guidelines.” This is the pattern that scales — the marketer is doing creative direction, not file management.
Costs (creator’s plan as data point)
- Higgsfield Creator plan (subscription, credit-based).
- GPT Image 2.0 generations: ~4 credits per image (so the 4-variation hero step ≈ 16 credits).
- Seedance 2.0 1080p video: ~45 credits per clip (3 video clips in this campaign ≈ 135 credits).
- Approximate campaign credit budget: ~170 credits for this end-to-end run, not counting iterations or copy-overlay re-rolls.
The tutorial notes Seedance 2.0 “is not a cheap model.” Plan accordingly when estimating per-campaign costs.
Companion tooling not bundled in Higgsfield
The tutorial calls out two separate setups that make the workflow possible:
- Firecrawl MCP — web scraping tool with an MCP. Free credits available. Used for the brand-brief step (Step 2). Without it, the brand brief would be hand-typed.
- [[claude-ai/claude-vision-video-analyzer|claude-vision
/video-analyzerskill]] — Mike’s own MIT-licensed Claude Code skill (mikefutia/claude-vision, 21 stars). Routes local video files through Google’s Gemini API for native video understanding and returns a structured markdown report with anti-hallucination guardrails on the audio section. This is the “Gemini Vision API hooked up to my Claude account” the tutorial flags as separate-video territory. Used for video evaluation in Steps 5 and 7. Without it, Claude can grade stills but not motion.
Both are presented as “topic for another video” — the tutorial doesn’t walk through their setup, just notes they’re prerequisites for the deeper steps. The claude-vision skill is now documented in its own wiki article.
Honest limitations the tutorial surfaces
- Typographic overlays are rough. Small fonts, top-line text degrades. Multiple iterations needed.
- Seedance can fake “putting on” motion. Asked to show a creator pulling a hoodie over her outfit, Seedance had the hoodie “appear” instead. Cinematic camera moves work; complex object-on-body manipulation does not.
- Claude can’t grade audio in synthesized clips. The Gemini Vision integration handles visual evaluation; voice quality stays a human-judgement step.
- Outdoor “photoshopped” feel on some variants. Claude recommended V3 (outdoor) for the hero — the creator notes it “kind of looks photoshopped” but Claude’s reasoning still made the case. Worth re-prompting when Claude’s pick disagrees with your eye.
Implementation
- Tool/Service: Higgsfield MCP (
https://mcp.higgsfield.ai/mcp) + Firecrawl MCP + Claude Code + [[claude-ai/claude-vision-video-analyzer|claude-vision/video-analyzerskill]] (Mike’s own Gemini-backed skill). - Setup: Higgsfield MCP via Settings → Connectors (no API keys). Firecrawl MCP per its own docs. claude-vision:
git clone github.com/mikefutia/claude-vision && mv claude-vision ~/.claude/skills/video-analyzer, free Gemini API key from Google AI Studio,pip install google-genai. See the skill’s article for the full install walkthrough. - Cost: Higgsfield credits (Creator plan in the demo). Firecrawl free credits + paid plans. claude-vision is MIT-licensed/free; Gemini API has a generous free tier on Google AI Studio.
- Integration notes:
- Run from a dedicated project folder — Claude saves all generated assets here. The creator opens the folder before starting and “trusts the workspace” so Claude can write outputs.
- Reference images go in
source-assets/images/in the project folder. Pointing Claude at a relative path is cleaner than uploading per-prompt. - Brand brief is reusable across the whole campaign — write it once at the start, every later prompt says “pull the brand brief for visual style/voice/etc.”
- Higgsfield assets tab is a free safety net. Every generation is also stored in the user’s Higgsfield account at
higgsfield.ai/assetswith the original prompt preserved — useful when you lose the local file or want to revisit prompt phrasing. - Seedance pacing tuning. The creator’s UGC testimonial clip ran slightly slow; re-prompting for “speed up by 1.2x” or specifying a target word-rate in the brief helps.
Related
- Higgsfield MCP — the connector that powers Steps 3-7 (capabilities, setup, vs SDK comparison)
- Higgsfield Overview — REST API basics, async lifecycle (the layer underneath the MCP)
- Higgsfield Image-to-Video — Seedance 2.0 + alternatives, motion-prompt template referenced in Step 5
- Higgsfield SDK (Python) — SDK path for engineering surfaces; this workflow uses the conversational MCP path instead
- HeyGen Studio Automation with Claude Code — sibling pattern: Claude Code orchestrating a different stack (ElevenLabs + HeyGen + Remotion) for talking-head video
- video-use — Claude Code editing skill; complements this generation workflow when stitching clips into longer cuts
- [[claude-ai/claude-vision-video-analyzer|claude-vision
/video-analyzerskill (Mike Futia)]] — the Gemini-backed Claude Code skill that powers Steps 5 and 7 grading - [[claude-ai/claude-video|claude-video
/watchskill (Brad Brown)]] — sibling video-analysis skill (yt-dlp + ffmpeg + Whisper architecture; supports URLs) - AI Video Tools — topic index
- Essential MCP Servers — broader MCP ecosystem (Firecrawl is one of these)
- AI Marketing — applied marketing context; this workflow is the canonical “Claude as creative director” demo
Open Questions
- Firecrawl MCP setup. The tutorial defers to “another video” — the wiki could use a Firecrawl ingest article. Currently uncovered.
Gemini Vision API + Claude integration. Same — referenced as a separate setup, not detailed. How is it wired? MCP? Custom tool? Worth a follow-up article.Resolved 2026-05-06 — Mike Futia open-sourced his integration as the [[claude-ai/claude-vision-video-analyzer|claude-vision/video-analyzerskill]] (mikefutia/claude-vision, MIT, 21 stars). It’s a scoped Claude Code skill that routes local videos through Gemini’s API. See the skill article for full setup, comparison with claude-video, and Mike’s own usage notes.- Iteration economics. No data on how many re-rolls a typical campaign needs. The tutorial showed first-pass outputs; production runs likely need 2-3x the credit budget.
- Multi-product campaigns. The demo is one product (the hoodie). Does the brand brief survive across 5+ SKUs in one conversation, or does context drift?
- Seedance 2.0 vs Kling/Veo for UGC. The tutorial only used Seedance; the Higgsfield MCP also exposes Kling, Veo, and Minimax Hailuo. Which model does best at on-body product manipulation (the Step 7 weakness)?
- GPT Image 2.0 vs Soul / Nano Banana Pro / Flux / Seedream for these jobs. The tutorial picks GPT Image 2.0 for everything; the MCP exposes 16+ image models. Bench data would help inform default-model choice.
Try It
- Replicate the smallest meaningful slice first. Pick a real product page, drop it into Claude with the Higgsfield MCP connected, and run only Steps 1-3 (connector → brand brief without Firecrawl, type one in by hand → hero static, 4 variations, Claude grades). Total credit cost: ~16. Total time: ~15 min. This validates the orchestration pattern before you wire Firecrawl.
- Add Firecrawl second. Once the orchestration pattern feels natural, set up Firecrawl MCP and re-run Step 2 from a real URL. The brand brief Claude writes is the reusable artifact across every later step in the campaign.
- Save the brand brief as a project file. After Step 2 completes, ask Claude to save the brief to
project/brand-brief.md. Future conversations can@brand-brief.mdinstead of re-running Firecrawl every session. - For a Smile Springs–style dental client: the same pattern works for service businesses. Replace the hoodie with a procedure (e.g., the new-patient welcome experience) and a creator with a “tired-mom-needs-Saturday-appointment” persona. Brand brief carries; Higgsfield model choices stay similar.
- Defer the Gemini Vision step until you actually need it. Visual evaluation by Claude (no audio grade) is the typical 80% case — start there and add Gemini Vision when you’re consistently producing video and want automated motion grading.
- Treat Higgsfield credits as the budget unit. A 7-step campaign on the demo’s plan ≈ 170 credits before iteration. Multiply by 2-3x for realistic production. The Creator plan covers the demo cleanly; heavier campaigns probably want a higher tier.