Source: raw/Build_An_AI_Second_Brain_Knowledge_Base_Step-By-Step.md — YouTube tutorial (yke4fLQUsh4) by Matt Wolfe (creator identifiable from the sponsor discount code “Matt Wolfe” and his stated Friday AI-news-breakdown format). Walks through building a “second brain” on top of Karpathy’s LLM-wiki pattern, then extending it with two net-new layers — a grounded journal and a CRM.
A step-by-step build of a chat-with-your-knowledge “second brain” that starts from Karpathy’s LLM-wiki pattern and adds two layers on top: a journal whose AI responses are grounded in the saved wiki, and a CRM for remembering people and conversations. Capture is one-click via the Obsidian Web Clipper (which auto-pulls YouTube transcripts); the agent runtime is Codex (Claude Code / Cowork noted as equivalents); and a Codex hourly automation makes ingestion run on autopilot. The creator explicitly credits Karpathy for the wiki idea and the Obsidian-as-front-end choice, framing the journal + CRM as his own extrapolation.
Key Takeaways
- Three pillars around one knowledge base. The wiki/knowledge base sits at the center; the journal and CRM connect into it. The creator names the wiki and journal as the most broadly useful; the CRM is presented as swappable for whatever else suits you (workouts, recipes, classroom notes, research papers, sales calls).
- The fix for “second brain = where info goes to die.” Most second-brain systems are passive dumping grounds you never revisit. The grounding loop (journal + chat both pull from the wiki) is what makes the stored knowledge actually resurface and get used.
- Built on Karpathy’s pattern, credited explicitly. The wiki scaffold comes from prompting the agent with Karpathy’s LLM-wiki GitHub page; the journal and CRM are the creator’s additions.
- Capture is one click. The Obsidian Web Clipper saves any web page as a markdown note into
raw/, and auto-extracts the full transcript from YouTube videos — the primary ingestion path for articles, videos, podcasts, tweets. agents.md(AGENTS.md) is the whole control surface. Every behavior — how to ingest, how to handle a journal entry, how to update the CRM — is a prompt block inagents.md. Tweaking the system means editing that file (or just telling Codex to edit it). “This is all just prompts at the end of the day.”- The wiki self-builds from queries too. Asking a question doesn’t just read the wiki — the agent writes the reusable answer back as a new wiki page, updates
index.md, and appends tolog.md. Saving content and asking questions both grow the vault. - Grounded journaling is the payoff. A journal entry returns a ChatGPT-style response, but grounded in your own saved notes plus prior journal entries plus the CRM — e.g. surfacing a video you saved three days ago that addresses exactly what you’re stuck on, and detecting recurring patterns across entries.
- Obsidian is the visibility layer. Obsidian renders the markdown vault, the
index.mdcatalogs, and the graph view that visibly densifies as the vault grows. The actual intelligence is the agent +agents.md; Obsidian is how you see and navigate it.
How it’s built
The two tools you need: Obsidian (free markdown organizer/reader, obsidian.md) and Codex (the creator’s current IDE/agent of choice; free tier with usage caps on the free ChatGPT plan). Claude Code or Cowork are called out as drop-in alternatives for the agent runtime.
1. Scaffold the wiki from Karpathy’s pattern.
- Create a fresh Obsidian vault (creator names it “second brain”), delete the welcome note, note the folder path on disk.
- In Codex, add a new project pointed at that exact vault folder (“use an existing folder”).
- Prompt: build out architecture based on Karpathy’s LLM wiki [link to the GitHub page]; the current second brain folder is the folder Obsidian is connected to, it is currently empty so we’re building from scratch.
- First run over-built (51 files). Re-prompt: please remove all the extra crap and just build what’s explicitly called for in Karpathy’s game plan. Result is the minimal scaffold:
raw/(immutable source material) +raw/assets/(optional Obsidian attachments) +wiki/(AI-generated markdown) +agents.md(operating contract) +index.md(catalog) +log.md(change log).
2. Configure the Obsidian Web Clipper.
- Install from
obsidian.md(bottom-of-page link) → Add to Chrome. - In Clipper settings: set the vault name to match Obsidian exactly; pick the default template; pull in properties (source title, source URL, created date, an auto
web cliptag) plus the page content. - Set the note location to
rawso every clip lands in the inbox folder.
3. Ingest sources.
- Click the Clipper on any page (article, tweet, podcast) or any YouTube video (it auto-loads the full transcript) → Add to Obsidian → file appears in
raw/. - Nothing happens automatically yet — you tell Codex to process the files inside the raw folder. The agent reads each source, summarizes it to bullets, extracts entities (people, companies, tools, ideas, themes) into wiki pages, auto-links related notes, renames sources to better titles, updates
index.md, appends tolog.md.
4. Refine agents.md (the operating contract). The creator demonstrates several tweaks — editable directly in Obsidian or by prompting Codex:
- Move processed sources. Add a
raw/processed/folder and an ingest step: move the source file from the root raw directory to raw/processed — so the inbox doesn’t pile up and you can see what’s been ingested. - Capture the YouTube channel name. The Clipper doesn’t pull it; instruct the agent to inspect the source URL and add the channel name to the original source front matter (not the generated wiki page).
- Prevent orphans. Add a step to cross-link any wiki pages generated or updated to the original source page so new pages always link back to their source.
5. Add the journal layer (new folder + agents.md rules). Create a journal/ folder, then instruct the agent: if a chat starts with the keyword journal, save the entire conversation as a dated, short-titled markdown file in journal/; maintain a journal/index.md (date + title + summary, linked); log the title and summary to log.md; and ground the response in the wiki, prior journal entries, and the CRM plus the model’s own knowledge — providing advice, insights, tactics, and detecting recurring patterns across entries.
6. Add the CRM layer (new folder + agents.md rules). Create a crm/ folder, then instruct the agent: when told it’s CRM info, create or update a person record (file named after the person) with name, contact details, where/how you met, and notes; maintain a crm/index.md listing people alphabetically with a short bio. Demonstrated by adding “Matthew Berman” with event/meeting context, then later asking where did I meet Matthew Berman again? and getting the answer back from the CRM record.
7. Test the grounded behaviors.
- Wiki query — what are some tips for motivation when I don’t feel like doing the hard task today? The agent checks the vault index, answers from saved content, and writes the reusable bit back as a new
wiki/motivation-for-hard-taskspage linked to its sources, updatingindex.mdandlog.md. - Journal entry — a brain-dump about clickbait-vs-literal-titles returns advice braided with the creator-strategy pages already in the vault, citing them rather than answering from a blank slate.
8. Automate ingestion (Codex Automations).
- Reprocess existing files once to apply the new rules (move-to-processed, add channel name).
- In Codex → Automations → New automation: title it, set work tree to local (runs in the selected project), pick the second-brain project, set cadence (creator uses hourly), prompt: if there are any unprocessed files inside the raw directory, please process them now. Set the model to the strongest available on high reasoning.
- Now clipping is the only manual step — every hour the automation ingests whatever landed in
raw/.
9. Optional GitHub backup.
- Create a private GitHub repo, then prompt Codex (with the GitHub plugin attached): commit this current version to my private GitHub repo [url].
- Fold backup into the automation: …once everything is processed, please commit and push the current version of the directory to the main branch on GitHub — so a backup happens every hour after ingestion.
Try It
- Reuse the journal + CRM layers conceptually — this vault already implements the Karpathy ingest pattern; the net-new ideas here are (a) a grounded
journal/folder whose responses pull from existing articles, and (b) acrm/folder keyed on person names. Either could be added as a new topic/folder withagents.md-style rules. - Steal the “process then move to
processed/” hygiene — this vault uses a.manifest.jsondelta check instead, but the move-on-ingest pattern is a simpler visible equivalent worth noting (seeCLAUDE.md§ Ingest pre-flight). - Steal the write-back-on-query behavior — Matt’s setup writes reusable query answers back into the wiki automatically. This vault has the same capability as the optional File-Back operation (
CLAUDE.md§ File-Back,auto_filebacksetting). - Note the automation parallel — the hourly Codex Automation is the same shape as this project’s scheduled cloud routines (Friday release sweep / Sunday watchlist sweep) and
bin/post-ingest. The “no n8n” personal-stack rule applies if porting it. - Web Clipper as a capture path — the Obsidian Web Clipper’s auto-transcript pull is a one-click alternative to this vault’s
bin/yt-transcripthelper for getting YouTube content intoraw/.
Related
- build-llm-wiki-for-business-walkthrough — closest sibling: another third-party YouTube build of the pattern (Dream Labs AI), but Claude Code + Hormozi 12-question seed + 4-folder schema, stopping at the wiki layer (no journal/CRM)
- joshpocock-vault — minimal faithful Obsidian template implementation (Stride starter)
- synthadoc — the architecturally-complete end of the spectrum (Python engine + Obsidian plugin)
- arscontexta-skill-graphs — plugin-packaged conversational vault setup (
self/notes/opslayout); closest to the journal-as-a-distinct-folder idea - karpathy-pattern-third-party-adoption — synthesis of how the wider community is adopting the pattern
- _index — topic root for community implementations of the Karpathy pattern
Open Questions
- Sponsored segment, not the build. ~50 seconds is a Hostinger ad for a one-click managed OpenClaw deployment (with the “Matt Wolfe” 10%-off code). It is sponsor content, unrelated to the second-brain build, and is not a claim about the system being built — excluded from the takeaways above.
- Creator identity inferred, not stated on-screen. Attributed to Matt Wolfe from the sponsor discount code and the stated Friday-AI-news-breakdown format; the transcript never states his name outright.
- No repo or template shipped. Unlike synthadoc or the Stride starter, this is a narrated build with no linked vault template, gist, or repo for the journal/CRM
agents.mdrules — the rules exist only as the dictated prompts transcribed here. Reproducing requires re-deriving them from the video. - CRM/journal grounding is prompt-only. Retrieval quality depends entirely on the agent re-reading the
index.mdfiles each turn (no embeddings/QMD-style search shown). Scaling behavior past a small vault is unverified in the source. - Auto-caption normalizations applied: the transcript’s “co-work” → Cowork and “Carpathy”/“Andre” → Karpathy; the transcript already rendered “Codex” and “Obsidian” correctly.