I Tried Every Popular Claude Skills System — Best Is the One You Build Yourself

Source: I_Tried_Every_Popular_Claude_Skills_System_Here_is_the_Best Creator: Code4AI URL: https://www.youtube.com/watch?v=VoxL_YmHR-I Duration: ~10-12 minutes Platform: YouTube

A contrarian take after two years of working with agents and reviewing every major skill library (Gary Tan’s gstack, Affaan Mustafa’s Everything Claude Code, Matt Pocock, Addy Osmani, BMAD, GSD, OpenSpec, Superpowers). The thesis: the best skill system is the one you build yourself — start with natural language prompting + native plan mode, only add skills when the agent demonstrably fails at something, keep them as short as possible. Most popular skill libraries replicate the same 5-step software development lifecycle (research → prototype → plan → build → test → polish) that engineers have used for 30+ years. Skills are documentation — they rot, they need maintenance, they bloat context. The minority position counterweighting the install-N-starter-packs pattern that dominates the wider operator community.

Key Takeaways

The universal 5-step pattern under every popular skill library: research / discuss-spec → prototype (front end only with dummy JSON) → plan (markdown breakdown into phases + verification steps) → build (one slice at a time) → test (lint + build + Playwright/browser + human smoke test) — plus an optional polish step (run a different model over the codebase to simplify / catch issues). All seven major libraries reviewed map onto this spine.
What each library specifically contributes (in order of complexity):
- Addy Osmani (Google) — clean 6-7 prompt spec → plan → build incrementally → test → review → simplify. The textbook implementation of the universal pattern.
- Matt Pocock (mattpocock/skills) — simplicity-first. Notable adds: diagnose (a sharper specify), grill-me (grounds Claude in the domain model before coding so context isn’t re-spent every session), prototype (front-end-first design mode), TDD-by-default, two-issues breakdown for vertical-slice work.
- Gary Tan (gstack) — opinionated, “a bit over-engineered” per the reviewer. The standout is office-hours — a 6-forcing-questions skill modeled on YC office hours that interrogates startup ideas before any code gets written. Worth lifting individually even if you don’t adopt gstack wholesale.
- Affaan Mustafa (Everything Claude Code, 183k stars) — the biggest. Memory, continuous learning, verification loops, sub-agent orchestration, security focus. The deep-dive reference if you want to see what a maximalist harness looks like.
- BMAD — enterprise-level “Party Mode” (BA + PM + senior architect personas). Recommended only at enterprise scale.
- Superpowers — lightweight. The reviewer’s stated favorite for results when forced to pick one library.
- OpenSpec / spec-based bundles — heavy overlap with the rest; useful for teams already aligned around spec-driven development.
Skills are just natural-language prompts in a special file. YAML front matter (name + description) is always loaded by the model so it knows what skills exist. Scripts, reference material, and assets can be bundled inside the skill folder. There’s no magic — they’re documentation Claude reads when triggered.
Most modern agents already do most of the work. Plan mode in Claude Code, Codex, and Cursor natively shards projects into phases, generates verification steps, and tracks to-dos — replicating what spec libraries used to provide via custom skills. The reviewer’s point: you don’t need a skill for what the harness already does well.
TDD is opt-in, not skill-resident. Every library advocates for test-driven development, but the practical recommendation is to say “include test-driven development” during planning and the agent starts writing tests alongside code. No dedicated TDD skill needed.
Frontend-prototype-first is the load-bearing discipline. A consistent recommendation across libraries: tell the agent “we’re in prototyping mode, develop front end only, use dummy JSON for back-end data, link components for navigation, don’t connect backend logic.” Reason: stops the agent from creating complicated back-end scaffolding it has to support, which slows everything down. Passing the prototype to the agent later as the back-end spec is much more productive than trying to design both at once.
Agents cheat at testing. Real warning: the test step often catches less than expected. Inevitably, the test phase remains a human-in-the-loop smoke test — click through the actual UI to verify the agent built what you wanted.
Skills rot like documentation. “Skills in some ways are essentially just documentation. And we’ve all experienced the scenario where comments go out of date, documents go out of date. You’re going to have to spend as much time updating these skills and keeping them current.” — direct quote. Implication: every skill added is a maintenance liability. Be selective.
The bespoke-skill rule. Only build a skill when (a) the agent has demonstrably messed up in the same way repeatedly, or (b) you want to encode bespoke information about your codebase / process. Write them short — a few short paragraphs is often enough. Long skill files bloat the context window and confuse the model.
The Agentic Development Lifecycle (ADLC) emerges. Traditional software has the Software Development Lifecycle. The reviewer proposes the agentic equivalent: managing your harness + skill management + cross-developer skill organization is a new engineering discipline. Vercel’s skills.sh is highlighted as a working primitive for storing skills in private repos, updating them, and sharing across an organization.
The bottom line. “Inevitably the best skill system and the best harness is going to be the one you develop over time for you. I think this is going to be how you really differentiate as a software developer — your agent harness, your set of skills that has been built up over time working with a particular codebase.”

How the Universal Pattern Maps to Existing Wiki Coverage

The reviewer’s 5-step ADLC matches what almost every popular skill library does. Wiki articles for each library (existing coverage):

Library	Wiki article	Distinguishing add
Everything Claude Code (Affaan Mustafa)	everything-claude-code-affaan-mustafa	Maximalist: 48 agents, 182 skills, AgentShield security, cross-harness
Gary Tan / gstack	garrytan-gstack	`office-hours` 6-forcing-questions skill
Matt Pocock	mattpocock-skills	`caveman` token-saver, `grill-me`, `diagnose`
GSD (Get-Shit-Done)	gsd-build-get-shit-done	6-step phase loop, 5-artifact persistent layer
BMAD-METHOD	bmad-method-agentic-dev	12+ persona Party Mode, enterprise SDLC
OpenSpec	openspec-spec-driven-vibe-coding	proposal.md + design.md + tasks.md primitives
Superpowers	superpowers-skills-framework	Closed end-to-end software dev methodology

Code4AI’s contribution to this cluster isn’t another library — it’s the meta-thesis: all of these collapse to the same SDLC pattern, which the harness now provides natively, so skills should be additive and bespoke rather than wholesale.

The Minority Position Worth Pinning

Almost every other operator-curated article on this wiki advocates for installing a bundled skill system:

Six Best Claude Code Skills for Business — install these 6
Seven Claude Skills That Run My Business — keep these 7
Nine Plugins to Build 10× Faster — stack three columns
Gary Tan’s gstack — clone this 23-tool setup
Everything Claude Code — install the 182-skill suite
Anthropic engineers’ four skill rules — prompt skills not Claude; build composable; update every session

Code4AI is the counter: don’t install pre-built libraries at all; rely on native plan mode + minimal bespoke skills written for your specific failure modes. This is the position worth knowing exists before reaching for a starter pack.

Practical Decision Framework

Synthesized from the video’s argument:

Situation	Recommendation
New to Claude Code, exploring	Native plan mode + natural-language prompting. Add nothing.
Hitting a specific failure mode repeatedly	Write a short bespoke skill (a few paragraphs) for that failure
Need a domain-grounding so context isn’t re-spent	Pocock’s `grill-me` pattern or a custom `domain-context.md`
Codebase-specific patterns the agent keeps missing	Bespoke skill embedded in `.claude/skills/`
Enterprise team, need governance	BMAD’s Party Mode or skills.sh-style centralized repo
Want to ship a polished MVP fast solo	Stay lightweight, follow the universal 5-step pattern manually
Heavy security focus	Lift specific patterns from Everything Claude Code without installing the whole thing

Try It

Audit your installed skills. Anything you haven’t invoked in 30 days is a maintenance liability — uninstall.
Run your next project with zero skills installed. Use only Claude Code’s native plan mode + a clear prototype-first prompt. Notice where the agent specifically fails.
Write your first bespoke skill only for one of those specific failure points — keep it under 10 lines of body text.
Try Pocock’s grill-me pattern before starting any new feature — get Claude to interview you about the domain so it doesn’t re-load context every session.
Adopt the universal 5-step manually: research / discuss → prototype-front-end-only → markdown plan with phases → build slice-by-slice → test (lint + build + smoke) → optional polish pass with a different model. No skill bundle required.

Anthropic Engineers’ Four Skill Rules — the opposite-but-compatible position (yes build skills, but composable + always update)
Skill Systems — Orchestrator + Child Pattern — what happens when you DO need to compose multiple skills
Everything Claude Code — the maximalist counter to this contrarian thesis
Garry Tan’s gstack — the opinionated 23-tool starter
Matt Pocock’s Skills — caveman, grill-me, diagnose — the patterns the reviewer specifically endorses
BMAD-METHOD — enterprise Party Mode
GSD — 6-phase spec-driven loop
OpenSpec — proposal/design/tasks spec primitives
Superpowers — reviewer’s stated favorite of the libraries
Agent Skills Overview — the formal spec of what skills actually are
Six Best Claude Code Skills for Business — install-this-stack position
Seven Skills That Run My Business — sister install-this-stack position
Anthropic’s Official Best Practices for Claude Code — primary source on native plan mode + 8 context tools

Open Questions

Vercel’s skills.sh maturity. The video name-drops skills.sh as the organizational primitive for skill management at team scale, but doesn’t show it. How well does it actually handle the rot/maintenance problem the reviewer flags?
Empirical comparison missing. The thesis (“best is bespoke”) is asserted from experience, not benchmarked. Is there a measured comparison of bespoke-only vs starter-pack on real engineering tasks?
When does an org cross the threshold from bespoke-skills to needing a real library? The video leaves this fuzzy — somewhere between solo dev and BMAD-level enterprise, there’s a transition point worth nailing down.

Jonathon's AI Wiki

Explorer

I Tried Every Popular Claude Skills System — Best Is the One You Build Yourself

Key Takeaways

How the Universal Pattern Maps to Existing Wiki Coverage

The Minority Position Worth Pinning

Practical Decision Framework

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

I Tried Every Popular Claude Skills System — Best Is the One You Build Yourself

Key Takeaways

How the Universal Pattern Maps to Existing Wiki Coverage

The Minority Position Worth Pinning

Practical Decision Framework

Try It

Related

Open Questions

Graph View

Table of Contents

Backlinks