How I Use LLMs — Andrej Karpathy (practical walkthrough)

Source: raw/karpathy-how-i-use-llms-transcript.md (full ~2h11m transcript, YouTube EWvNQjAaOHw, en-orig captions; same talk reposted at x.com/0xchromium/status/2063321324605280569) · raw/x-bookmarks-recent-digest-2026-07-24.md (a 2026-07 Karpathy post extending the voice section — see The long ramble session below) Type: Talk / tutorial Creator: Andrej Karpathy Duration: ~2h11m Published: early 2025 (companion to his “Deep Dive into LLMs like ChatGPT”)

Karpathy’s general-audience walkthrough of how he actually uses LLMs day to day — not a capabilities demo, but a working practitioner’s habits across model choice, tools, voice, and multimodal. The throughline: an LLM is a “zip file” of the internet turned into a helpful assistant by post-training; you’re talking to a self-contained entity until you hand it tools. The winning human skill is clear delegation + fast verification + taste for when to trust vs. inspect. The model specifics are 2025-era (GPT-4o, o1, Claude artifacts), but the mental model is evergreen.

Key Takeaways

The base model is a lossy zip of the internet with a knowledge cutoff. Without tools it’s a probabilistic document generator made helpful by post-training — vague on recent events, capable of confabulation. Everything else (search, code, files, camera) is bolting tools onto that core.
Context window = working memory. Start fresh chats aggressively. Old, irrelevant tokens distract the model, cost money, and can degrade quality. One topic per chat; close and reopen when you switch.
Choose the model tier intentionally. Cheap/fast models for routine lookups; pay for “thinking”/reasoning models only when the task (hard code, math, tricky reasoning) justifies the extra minutes and cost. Karpathy pays for top tiers because it’s cheap relative to the value.
Run an “LLM council.” Ask the same question across multiple providers (ChatGPT / Claude / Gemini / Grok) and read the consensus and the disagreements — cross-checking surfaces errors and blind spots.
Tools convert a guesser into a researcher. Internet search (his habit: Perplexity) for anything recent/niche/changing; “Deep Research” for tasks worth minutes of chained search+reasoning that would cost you 30–90 min of manual browsing. Treat both as high-quality first drafts with citations — still verify.
Make the model run code, not “think in text.” The code interpreter (Python) is the real unlock for math, data, and plots — but inspect its implicit assumptions and outputs.
Generate disposable single-use software (artifacts). Especially strong in Claude: instead of hunting for the perfect app, have the model build a tiny custom one for this need — a React widget, a Mermaid diagram to understand a chapter, flashcards. Software as a throwaway thought tool.
Voice is massively underrated. A huge share of his mobile usage is voice (far lower friction than typing). True advanced voice mode handles audio natively; on desktop he pipes speech into any app via SuperWhisper.
Multimodal is already practical. Point the camera at books, devices, maps, nutrition labels, blood-test results — get live help.
Good delegation beats prompt magic. Be concrete and specific, give examples, and save reusable setups (custom instructions, memory, custom GPTs) for repeatable tasks.

The Mental Model

An LLM is built by pre-training (compress a large slice of the internet into parameters — a “zip file”) then post-training (turn that document-completer into a helpful assistant persona).
When you chat with the bare model you’re querying that compressed knowledge: fast, broad, but probabilistic, slightly vague, and frozen at the training cutoff. It can hallucinate, and it has no idea about anything recent.
Capability comes from bolting tools onto the core: search (fixes recency/niche), code interpreter (fixes math/data), file upload (fixes “read this specific thing”), camera/voice (fixes input friction and the physical world).
For the deeper “how the model thinks” companion, this talk points back to Karpathy’s Deep Dive into LLMs; this one is the usage layer on top.

Karpathy’s Tool Ladder (his actual workflow)

Plain chat — fast factual/explanatory queries against the model’s baked-in knowledge. Keep chats short and single-topic.
Thinking models — switch up for hard reasoning, math, and code; they run internal chain-of-thought for seconds-to-minutes. He shows a debugging case where a thinking model succeeds where a non-thinking one fails.
Search — for anything recent, priced, launched, rumored, or obscure. Perplexity is his reflex, but ChatGPT/Grok/etc. now search too. Outputs are first drafts — verify.
Deep Research — the model spends minutes doing chained search + tool use + reasoning; excellent for what would be 30–90 min of manual research. High-quality cited draft, still verify.
Code interpreter — real Python execution for analysis, math, and plots; don’t accept “text-only” reasoning for quantitative work, and check its assumptions.
Artifacts / custom apps — generate disposable software for the moment’s need (diagrams via Mermaid, flashcard apps, small React tools). Strong in Claude.
File uploads — drop in papers, PDFs, whole books (he reads classics like Wealth of Nations alongside the model) and discuss them.
Cursor (not web chat) for serious coding — a dedicated coding tool that holds full project context beats pasting into a chat window.

Underrated Moves

Voice-first on mobile. Lowest-friction interface; he uses it constantly. Distinguish true native-audio voice mode from speech-to-text wrappers. On desktop, SuperWhisper transcribes speech system-wide into any chat box.
Custom instructions + memory + custom GPTs. Teach the model your preferences once; build small reusable assistants for recurring jobs (his example: extract Korean vocabulary from a screenshot and format it for Anki).
Diagrams to understand, not just to present. Have the model render a Mermaid diagram of a book chapter or argument so you can grasp it spatially.
Multimodal in daily life. Camera at nutrition labels, devices, maps, blood-test results, book covers — practical live assistance, not a demo.

The long ramble session — a 2026-07 addition from Karpathy

[X signal — @karpathy, 2026-07-21] Source: raw/x-bookmarks-recent-digest-2026-07-24.md (fav 47.7k / bm 14.5k; extracted verbatim in the digest). The talk above treats voice as a friction fix — the lowest-effort way to get words into the box. This later post is a different claim: voice as a bandwidth fix, used deliberately to overcome your own laziness about supplying context.

The technique, in his words: “Sometimes the LLM needs more bits to understand what you’re trying to achieve, but you’re too lazy to type them.” So — lean back, switch to /voice, and ramble for about ten minutes, “total mess, anything goes, full stream of consciousness.” Three details make it work:

Declare the mode up top. He sometimes opens with something like “switching to speech recognition sorry for any typos…” — which tells the model to expect transcription noise and disfluency rather than treating them as signal.
Optionally turn it into a short interview. A few turns of the model asking and you answering, instead of one monologue.
The output beats the input. “LLMs are somehow very good at reconstructing long incoherent rambles and often their echo of your own tangle of thoughts comes out quite a bit cleaner than what you started with.”

Why he says it pays off: the result is a better “mind meld” — you “have to correct things less from that point on.” The cost is ten minutes of talking; the return is every subsequent turn in the session starting from a shared understanding. That inverts the usual instinct to compress your prompt: here the deliberate over-supply of messy context is the point, and the model’s summarization does the compressing. It pairs naturally with the fresh-chat-per-task rule above — the ramble is a cheap way to re-establish full context at the start of a new chat.^[inferred — the pairing with fresh-chat hygiene is this wiki’s synthesis; Karpathy’s post does not mention it]

Note this is a standalone X post, not a claim backed by measurement, and /voice here refers to whatever voice-input mode the reader’s tool exposes.

Try It

For WEO Marketly / any team standardizing on LLMs day to day:

Adopt “fresh chat per task.” Make it a team norm — it’s the single cheapest quality + cost win.
Write a model-tier cheat sheet. Which model for routine lookups vs. hard reasoning/code, and when paying for a thinking model is worth the minutes. Tie it to intelligence levers.
Build a few reusable “custom GPTs”/projects for recurring deliverables (e.g., a brand-voice rewriter, a competitor-summary assistant) instead of re-prompting from scratch — the delegation-over-prompt-magic point.
Default to Deep Research for any 30–90 min manual-browse task, then verify the citations — the highest-leverage time save.
Put voice in the workflow. Try SuperWhisper for desktop dictation; it removes typing friction for long briefs. 5b. Open a hard task with a ten-minute ramble. Before the first real request in a fresh chat, dictate a full stream-of-consciousness brain-dump of what you’re trying to do — declare that you’re using speech-to-text, don’t edit it, and let the model’s clean-up be the brief. Judge it on whether you correct the model less for the rest of the session.
Treat artifacts as throwaway tools — when you’d normally search for an app or build a diagram by hand, ask the model to generate a one-off instead.

From Vibe Coding to Agentic Engineering — Karpathy’s framing of where this human-as-delegator skill is heading next (schedule + tools + verification = agents)
AutoResearch — Self-Improving Coding-Agent Loop — the agentic continuation of “Deep Research” once you add loops and verification
Karpathy on Skills (Multica AI) — Karpathy’s take on the reusable-setup / skill idea this talk gestures at
Dynamic Workflows — the “give the workflow a schedule + tools + verification” leap from chat to autonomous agents
Prompt Engineering — the “be concrete, give examples, save reusable setups” delegation craft
2026 AI-Work Restructuring — the macro view of the delegate-and-verify operating model
Karpathy’s Workflow, Realized in Claude Code — this tool ladder mapped point-by-point onto Claude Code’s concrete features

Open Questions

Exact publish date / model lineup at recording. The talk is early-2025 and references GPT-4o, o1, and Claude artifacts; specific model names and UI have since moved on, though the workflow endures.
Which specifics are now stale. Operator/computer-use agents and newer model tiers (Opus 4.x, etc.) have advanced past the talk’s examples — a future refresh could map each 2025 tool habit to its 2026 equivalent.

Jonathon's AI Wiki

Explorer

How I Use LLMs — Andrej Karpathy (practical walkthrough)

Key Takeaways

The Mental Model

Karpathy’s Tool Ladder (his actual workflow)

Underrated Moves

The long ramble session — a 2026-07 addition from Karpathy

Try It

Open Questions

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

How I Use LLMs — Andrej Karpathy (practical walkthrough)

Key Takeaways

The Mental Model

Karpathy’s Tool Ladder (his actual workflow)

Underrated Moves

The long ramble session — a 2026-07 addition from Karpathy

Try It

Related

Open Questions

Graph View

Table of Contents

Backlinks