Source: Once You Know This Building RAG Agents Becomes Easy in n8n (Nate Herk, YouTube, https://youtu.be/Q4iEslmyMyM)
n8n is a visual workflow builder for chaining APIs, AI models, and business logic without writing the glue code by hand. This article uses Nate Herk’s RAG-agents tutorial as a foundational primer: what n8n’s core building blocks are (nodes, triggers, connections), how retrieval-augmented generation actually plugs into a workflow, and why someone already deep in Claude Code might still want n8n as the connective tissue around their agents.
Key Takeaways
- n8n is a node-based automation canvas. Every workflow is a graph of nodes connected by edges. A node does one thing — fire on a webhook, query Postgres, call an LLM, transform JSON — and you wire outputs into inputs. The whole workflow runs deterministically when its trigger fires.
- Three primitives carry every workflow. Triggers (webhook, schedule, chat, form, manual) start an execution. Nodes (regular operations) do work. Connections are the typed lines between them carrying the data payload from one node to the next.
- RAG isn’t just “throw it in a vector DB.” The video’s central claim: people sprint to vector search the moment an agent needs external data, but four retrieval patterns exist and the right one depends on the data shape — filter queries, SQL queries, full-context, and chunk-based vector retrieval.
- Match the retrieval method to how a human would answer. Filters when a human would use a spreadsheet filter. SQL when a human would use a pivot table. Full context when a human would read the whole document. Vector search when a human would skim a knowledge base for a relevant snippet.
- Vector RAG is fast and cheap, but loses document-level context. Chunks lose order, source URL, timestamps unless you add metadata tagging. Asking for “a chronological breakdown” of a vectorized YouTube transcript will return chunks in semantic-similarity order, not video order.
- Tabular data breaks chunk-based retrieval. “What week had the highest sales?” run against a vectorized CSV will compute the max within whichever single chunk got retrieved, not the global max. Use SQL or filters for structured data.
- Why n8n alongside Claude Code: n8n owns deterministic flow (triggers, schedules, multi-step API calls, retries, integrations); Claude Code owns reasoning and code generation. Most production AI workflows in 2026 use both — n8n as the wiring, Claude or another LLM as the brain inside individual nodes.
How RAG Works in n8n
A minimal RAG agent in n8n is four nodes:
- A trigger — usually a Chat Trigger or Webhook node. The user message comes in here.
- A retrieval node — for vector RAG, this is a Vector Store node (Supabase, Pinecone, Qdrant) configured to query the embedded knowledge base. It returns the top-K chunks most semantically similar to the user’s question.
- An AI Agent node — the reasoning core. It receives the user message, the retrieved chunks injected as context, a system prompt, and a list of tools. The video uses GPT-5 Mini as the underlying model, but the agent node is model-agnostic.
- A response back to the trigger — the agent’s output flows back into the Chat Trigger response or is posted to wherever the workflow needs to go (Slack, an API endpoint, a database write).
For non-vector RAG variants the retrieval node changes shape but the topology is the same. Filter RAG swaps the vector node for an n8n Data Table or Postgres “Get Rows” node configured with filter expressions the agent constructs on the fly. SQL RAG gives the agent a tool that takes a SQL string and runs it against Postgres. Full-context RAG loads entire documents (or a tool selector that picks one of N documents) into the prompt and skips retrieval entirely.
A concrete worked example from the video: a sales-data agent with 20 rows of product/date/price data in an n8n Data Table. User asks “How many Bluetooth speakers did we sell on September 16th?” The agent calls a filter tool with product_name = "Bluetooth Speaker", then a date filter for 2025-09-16, gets two rows back, sums the quantities (1 + 4), and replies “five.” Filter RAG cost a fraction of vector search because n8n only loaded the matching rows into the agent’s context — not the entire table.
The system prompt is load-bearing. The video shows that the agent had to be told the exact valid product names (“Wireless Headphones”, “Bluetooth Speaker”, “Phone Case”) with capitalization, because filters do exact string matching, not semantic matching. Add a new product? Update the prompt. SQL agents have the same constraint unless you also give them a “describe schema” tool.
Try It
Build your first RAG flow in n8n in roughly 30 minutes:
- Sign up at n8n.cloud or self-host via Docker. The cloud trial is free for 14 days; self-host is free forever.
- Pick the dataset shape first. Open a spreadsheet of your data. If a human would filter it, plan for filter RAG. If a human would pivot/aggregate it, plan SQL RAG. If it’s prose under ~50 pages, plan full-context. If it’s a large prose corpus, plan vector RAG.
- Add the trigger. Drop a Chat Trigger node onto the canvas. Set it to “When chat message received.”
- Add the AI Agent node. Connect it downstream of the trigger. In the agent’s settings, choose your LLM provider (OpenAI, Anthropic, Google) and paste an API key.
- Add retrieval. For vector RAG, drop a Supabase Vector Store node configured as a Tool the agent can call. For filter RAG, use the n8n Data Table node as a tool. Pre-load your data either by manually inserting rows or by running a one-time ingest workflow that embeds documents.
- Write the system prompt. Tell the agent what tools it has, when to use each, and any exact-string constraints (valid filter values, date formats).
- Test in the chat panel that ships with n8n. Ask three questions: one a filter would answer, one a vector search would answer, one neither would answer cleanly. Watch the execution log to see which tools the agent picked.
- Iterate the prompt until tool selection matches your intent. Then turn on the production webhook URL or schedule trigger.
Related
- Claude AI — the reasoning layer most production n8n workflows use inside the Agent node
- GoHighLevel — n8n is commonly wired to CRM events as triggers
- Hermes Agent — alternative self-hosted agent architecture that overlaps with n8n’s automation role
- Agents and Agentic Systems — broader patterns for tool use and multi-agent orchestration
- Prompt Engineering — the system prompt inside an n8n AI Agent node is where most accuracy gains live