Graphify — Cross-Harness Knowledge-Graph Skill (Tree-Sitter + LLM Hybrid)

Source: Safi Shamsi Graphify README v7 (2026-05-09) (github.com/safishamsi/graphify)

Graphify is Safi Shamsi’s cross-harness knowledge-graph skill that turns any folder of code, docs, PDFs, images, or videos into a queryable graph. One command (/graphify . in 18+ AI coding assistants, or graphify install to wire it up) produces three artifacts: graph.html (interactive viewer), GRAPH_REPORT.md (summary with “god nodes” + surprising connections + suggested questions), and graph.json (queryable serialized graph). Tree-sitter AST extraction across 28 languages runs locally with zero API calls; non-code modalities (docs/PDFs/images/video) route through your assistant’s model. Optional MCP server exposes query_graph / get_node / get_neighbors / shortest_path for repeated structured access. MIT license, Python 3.10+, PyPI graphifyy (double-y). 45,493 stars / 4,930 forks at fetch (2026-05-09); created 2026-04-03 — five weeks old. Sister project: Penpax, a commercial always-on layer that applies the same graph approach to meetings/email/browser/files (closed beta, waitlist).

Key Takeaways

One-command universal skill. /graphify . in any of 18 AI coding assistants — Claude Code, Codex, OpenCode, Cursor, Gemini CLI, GitHub Copilot CLI, VS Code Copilot Chat, Aider, OpenClaw, Factory Droid, Trae, Trae CN, Hermes, Kimi Code, Kiro, Pi, Google Antigravity. Same install path, same output, same skill artifact wrapped per-harness.
Three output artifacts, deliberately separated.
- graph.html — interactive browser viewer (click nodes, filter, search)
- GRAPH_REPORT.md — the highlights layer: god nodes, surprising connections, 4-5 suggested questions, confidence tags (EXTRACTED / INFERRED / AMBIGUOUS)
- graph.json — full serialized graph for query-without-re-parsing The deliberate structure: humans read the report, agents read the JSON, both can fall back to the HTML. Pattern is similar to QMD’s “index → query → fetch” separation but on a graph topology rather than text retrieval.
Tree-sitter AST across 28 languages, locally. Python, TypeScript, Go, Rust, Java, C/C++, Ruby, C#, Kotlin, Scala, PHP, Swift, Lua, Luau, Zig, PowerShell, Elixir, Objective-C, Julia, Vue, Svelte, Groovy/Gradle, SQL, Fortran (.f/.f90/.f95/.f03/.f08), .jsx, .tsx. No API calls for code — tree-sitter handles all 28. The substantive engineering claim of the project.
Non-code modalities go through your LLM. Docs (.md/.mdx/.qmd/.html/.txt/.rst/.yaml), Office (.docx/.xlsx with [office] extra), PDFs, images (.png/.jpg/.webp/.gif), video/audio (with [video] extra), YouTube URLs. Watch the cost line. A repo with 200 PDFs gets 200 LLM calls; the AST extraction is free, the document extraction is not.
/graphify skill vs graphify extract headless. Two operating modes. The skill (interactive, in your IDE) routes through whatever model your session runs — same model that’s already loaded answers the report-generation prompts. The headless mode (graphify extract) runs without an IDE and requires explicit --backend gemini|kimi|claude|openai|ollama|bedrock plus the corresponding API key. Bedrock is keyless (uses AWS IAM credential chain). For CI integration, headless + Bedrock is the cleanest path.
Six output formats from one extraction. --obsidian (Obsidian vault), --wiki (agent-crawlable markdown wiki), --svg, --graphml (Gephi/yEd), --neo4j (Cypher dump), --neo4j-push (direct push to Bolt). Plus the default HTML/JSON. Means the graph is portable into existing tooling — not a graphify-only artifact.
Incremental rebuilds via git hook. graphify hook install adds post-commit + post-checkout hooks that re-parse only changed files (AST-only — no API cost). Plus a git merge driver for graph.json so two devs committing in parallel get their graphs union-merged automatically. The merge driver is a non-trivial engineering detail — typical graph-export tools don’t ship with merge drivers and parallel commits would corrupt the JSON.
Cross-project global graph. graphify extract ./docs --global --as myrepo registers a project’s graph into ~/.graphify/global.json. graphify global list shows all registered repos with node/edge counts. The cross-project graph is the actual moat — once you’ve indexed three or four projects, the global graph can answer “what concepts repeat across my work?” which a single-repo grep never can.
MCP server is optional, not required. python -m graphify.serve graphify-out/graph.json exposes structured tools (query_graph, get_node, get_neighbors, shortest_path) for repeated agent access without re-parsing. For one-off code-walking, the CLI graphify query is enough; for an agent that needs to traverse the graph repeatedly during a session, the MCP server avoids re-loading the graph each call. Codex parity: Codex requires multi_agent = true in ~/.codex/config.toml and uses $graphify instead of /graphify.
Privacy claim is partial, not absolute. Code stays local (tree-sitter); video/audio is transcribed locally with faster-whisper. Docs, PDFs, and images transmit to your LLM provider’s API. No telemetry. The privacy framing is honest but the practical scope is “code stays local; document semantics do not.”
MIT license — clean for commercial use. Unlike GitNexus (PolyForm-Noncommercial, requires commercial license via akonlabs.com for agency deployment), Graphify is MIT. For WEO Marketly client work or any commercial agency deployment, this is the license-cleaner of the two.
The 45.5k-star number is a marketing artifact, not a quality signal. Created 2026-04-03; fetched 2026-05-09 — five weeks old. 45,493 stars / 4,930 forks for that age is well outside organic-growth distribution. Author ships a Gumroad book (The Memory Layer), Graphify Labs domain, sponsor button, X account, LinkedIn presence, 30+ translated READMEs. Treat the star count as a marketing signal; evaluate the engineering on its own. The 28-language tree-sitter coverage and the git merge driver are real engineering; the 18-platform install matrix is one SKILL.md wrapped 18 ways.
Penpax is the commercial play. Penpax (waitlist) extends the same graph approach to the user’s entire working life — meetings, browser history, emails, files, code — updating continuously, on-device. The OSS Graphify is the loss-leader; Penpax is the SaaS. Worth knowing when evaluating long-term project incentives.

How it compares to GitNexus

The closest in-scope wiki article is GitNexus (Abhigyan Patwari / akonlabs). Both target “structural code understanding via knowledge graph + agent integration.” Direct comparison:

Dimension	Graphify (safishamsi)	GitNexus (abhigyanpatwari)
License	MIT	PolyForm-Noncommercial (commercial license required for agency / company deployment)
Stars / age at fetch	45,493 / 5 weeks (created 2026-04-03)	37,048 / 9 months (created Aug 2025)
Stack	Python (tree-sitter for code, LLM for non-code)	TypeScript (tree-sitter, LadybugDB graph storage, browser WebAssembly)
Languages	28 (Python, TS/JS, Go, Rust, Java, C/C++, Ruby, C#, Kotlin, Scala, PHP, Swift, Lua, Zig, PowerShell, Elixir, Julia, Vue, Svelte, Groovy/Gradle, SQL, Fortran, etc.)	14+ (TypeScript, Python, Java, Go, Rust explicitly named)
Beyond code	Yes (docs, PDFs, Office, images, video, YouTube URLs) — routes through LLM	Code-focused; no native PDF/image/video pipeline
MCP tools	4 (`query_graph`, `get_node`, `get_neighbors`, `shortest_path`)	16 (hybrid search, impact analysis, symbol context, multi-file rename, dependency graph, call chain, cluster, execution flow, etc.)
Web UI	None (HTML viewer is local, not hosted)	Yes — hosted at gitnexus.vercel.app, browser-based WebAssembly indexing
Cross-harness install	18 AI assistants (Claude Code, Codex, Cursor, Gemini, Copilot, Aider, OpenClaw, Hermes, Antigravity, etc.)	2 first-class (Claude Code plugin + Cursor integration), CLI/MCP for others
Cross-project graph	Yes — `~/.graphify/global.json` with `graphify global list`	No (per-repo only)
Incremental updates	Git hooks (post-commit + post-checkout, AST-only) + merge driver for graph.json	Six-phase pipeline; incremental story not as explicit
Storage format	JSON (portable; also exports to GraphML, Neo4j Cypher, Obsidian)	LadybugDB graph database (specialized)
Output artifact	3 files (HTML + Markdown report + JSON)	Knowledge graph in browser/CLI; agent queries via MCP
Commercial layer	Penpax (separate product, waitlist)	akonlabs.com (commercial-license route for the OSS itself)

Practical decision rule for this wiki’s user base:

Commercial / agency / WEO Marketly client work: Graphify (MIT, no licensing friction).
Personal / academic / non-commercial: Either is free; pick on technical fit.
Need 16 MCP tools including impact analysis + multi-file rename: GitNexus (richer MCP surface).
Need PDF/image/video extraction in the same graph as code: Graphify (single pipeline for mixed corpora).
Need cross-project graph that spans your whole workspace: Graphify (global subcommand).
Need browser-based / no-install / drop-a-zip workflow: GitNexus (gitnexus.vercel.app).

Where this fits in the wiki

Code-intelligence layer. Sits next to GitNexus in the same niche; the two articles together form the canonical comparison for “knowledge-graph-over-codebase + agent integration” decisions.
Cross-harness skill pattern. Joins Everything Claude Code, The Agency, and last30days-skill in the “single skill / repo, many AI coding assistants” pattern. The 18-platform install matrix is the same pattern those projects use.
Hermes-platform support is named. graphify install --platform hermes and graphify hermes install — the Hermes harness is a first-class target. For Hermes users this means the skill is wired without manual config translation.
Composes with Managed Agents orchestration — a Managed Agent could call Graphify’s MCP server (query_graph) to traverse a codebase mid-task.
Adjacent to synthadoc — same idea (build an indexed knowledge structure and let an agent query it), but synthadoc indexes documents into chunked retrieval; Graphify indexes code+docs into a graph topology. Different retrieval geometry for the same problem.
Pairs with Open Design’s 15-CLI auto-detection pattern — both projects acknowledge the cross-harness reality and ship one product that wires into many.
Different use case than QMD (the wiki’s own retrieval layer). QMD = hybrid BM25+vector+LLM-rerank over markdown text (semantic retrieval). Graphify = entity-relationship graph extracted via tree-sitter AST (structural topology). Same word “knowledge graph,” different jobs. For this user’s karpathy wiki, QMD is the right tool; for the user’s ~/Auto1111/hermes-agent/ codebase, Graphify is the right tool.

Implementation

Tool/Service: Graphify (safishamsi/graphify v7) — Python skill + CLI for cross-harness knowledge-graph extraction.
Setup:
- Install: uv tool install graphifyy && graphify install (or pipx / pip equivalents). PyPI package is graphifyy (double-y); CLI command is graphify.
- Per-harness wiring: graphify install for Claude Code (Linux/Mac); graphify install --platform <harness> or graphify <harness> install for the other 17 assistants. Codex requires multi_agent = true in ~/.codex/config.toml.
- First run: cd <project> then /graphify . (or graphify . in PowerShell). Outputs to graphify-out/.
- Optional persistent agent integration: graphify <harness> install writes a config that makes the assistant read GRAPH_REPORT.md before answering codebase questions; on Claude Code, Codex, and Gemini CLI a hook fires before every file-read call.
- Optional MCP server: python -m graphify.serve graphify-out/graph.json for repeated structured query access.
Cost:
- Code extraction — free (tree-sitter local, zero API calls).
- Doc/PDF/image extraction — paid against your AI assistant’s model API (or your provided backend key for headless mode).
- Video/audio transcription — free (faster-whisper runs locally; install via pip install graphifyy[video]).
- The Python package itself is MIT, no fee.
Integration notes:
- MIT license — clean for commercial / agency / WEO Marketly client deployment without licensing friction.
- 28 code languages via tree-sitter (substantive coverage, not just the top 5).
- graphify-out/ is meant to be committed to git so teammates inherit the graph; graphify-out/manifest.json and graphify-out/cost.json should be .gitignored (mtime-based / local-only).
- graphify hook install adds a git merge driver — parallel commits to graph.json get union-merged automatically. Non-trivial engineering detail; rare in the OSS-graph-export space.
- --update flag is the daily ergonomic — re-extracts only changed files, keeps API cost minimal.
- Cross-project: --global --as <name> registers a graph into ~/.graphify/global.json; graphify global list enumerates registered repos. The global graph is queryable across repos via graphify query against the merged JSON.
- Headless backends: gemini, kimi, claude, openai, ollama (local — set OLLAMA_BASE_URL / OLLAMA_MODEL), bedrock (keyless — uses AWS IAM credential chain). Bedrock + CI is the cleanest credentials story for an agency.
- Privacy: code + video stay local; docs/PDFs/images transmit to LLM provider. No telemetry.

Open Questions

The 45.5k-star claim. Five-week-old repo with 45k stars. What’s the actual engagement metric? Pepy.tech downloads (referenced via PyPI badge) would be the more honest signal — worth checking before recommending widely.
Penpax incentive structure. Graphify is MIT; Penpax (the always-on commercial layer) is the SaaS. How aggressively does the OSS pull users toward the commercial product over time? Worth watching the README diff history for “now hosted on Penpax” framing creeping in.
LLM cost ceiling on document-heavy corpora. A repo with 200 PDFs = 200 LLM calls per full extraction. The --update flag mitigates this but doesn’t eliminate the first-run cost. Concrete numbers: per-PDF token cost (roughly 5k-50k tokens depending on length), compounded across the corpus. The README doesn’t quote total-cost benchmarks for typical repos.
Cross-project graph quality. ~/.graphify/global.json claim is interesting but unevaluated in the README. Does the union-merged graph actually surface useful cross-project patterns, or does naming-collision / context-loss degrade it past 4-5 repos?
MCP tool surface vs GitNexus. Graphify’s 4 MCP tools (query_graph, get_node, get_neighbors, shortest_path) vs GitNexus’s 16 (impact analysis, multi-file rename, etc.). For agentic refactoring, which surface gives the agent enough to actually act, not just inspect? Worth a head-to-head trial.
Tree-sitter language quality at the long-tail. 28 languages claimed. Languages like Luau, Zig, Julia, Fortran (multiple variants) have less mature tree-sitter parsers than Python or TypeScript. How well do AST extraction + clustering hold up on those?
--neo4j-push security model. Pushing to a Bolt endpoint requires credentials — how are they passed? README doesn’t say. Production teams will need to know before exposing a graph DB to an unattended skill.
Penpax data flow. “On-device, no cloud” is the Penpax claim per the graphifylabs.ai marketing. The OSS Graphify routes documents to your LLM provider — does Penpax do the same, or does it ship a local model? README doesn’t disclose; the marketing site presumably will.

Try It

Cheapest valid first run. pip install graphifyy && graphify install. Pick a small Python project, cd into it, run graphify . (PowerShell users) or /graphify . (Claude Code session). Open graphify-out/graph.html in a browser. Total time: under 5 minutes for a 50-file repo. No LLM cost if there are no docs/PDFs in the project (code-only is tree-sitter local).
Hermes / OmniPresence trial. This user’s most plausible high-value target: ~/Auto1111/hermes-agent/ (multi-language: TS frontend + Python backend per global CLAUDE.md). Run graphify . once and assess whether GRAPH_REPORT.md is keepable as onboarding documentation. The cross-language extraction is the actual pitch for this codebase.
Skip the karpathy wiki. This vault is 99% markdown; tree-sitter has nothing structural to extract from .md beyond what wikilinks already give you. QMD already handles wiki retrieval. Running Graphify here would chew API tokens to rephrase what you already have.
Compare to GitNexus head-to-head. Pick the same codebase. Run both. Open the report files side-by-side. Note where Graphify’s 28-language coverage helps; note where GitNexus’s 16-MCP-tool surface (impact analysis, multi-file rename) wins. The comparison is more useful than either tool’s marketing.
Wire MCP into Claude Code. python -m graphify.serve graphify-out/graph.json, then in a Claude Code session ask “what connects <entity-A> to <entity-B>?” — uses shortest_path directly. Compare against grep + Read + manual traversal for the same question. Note where the MCP path saves tokens vs where the grep path is faster.
Add the git hook. graphify hook install — post-commit AST rebuild costs zero API tokens; the merge driver protects parallel-commit integrity. Recommended even for solo developers as future-proofing if collaboration grows.
For commercial / WEO Marketly client engagements, Graphify’s MIT license clears the licensing question that GitNexus’s PolyForm-NC raises. If license is the load-bearing dimension, Graphify wins by default.
Watch for Penpax availability. graphifylabs.ai — the always-on commercial layer that extends the graph approach to meetings/email/browser/files. Currently waitlist. Worth tracking via the watchlist if the user wants the same pattern beyond code.

GitNexus — Zero-Server Code Intelligence Engine with Graph RAG — direct competitor; the comparison table above is the load-bearing piece
synthadoc — analogous “indexed knowledge structure + agent query” pattern for documents (chunked retrieval rather than graph topology)
Everything Claude Code (ECC) — same cross-harness-skill pattern; ECC is agents/skills/MCP bundles, Graphify is single-purpose graph
agency-agents) — another cross-harness install matrix (11+ AI assistants) with one skill, many wrappers
last30days-skill — single-purpose skill with cross-harness reach; built by the same author (Matt Van Horn) as the Printing Press CLI factory
Open Design (nexu-io) — auto-detects 15 coding-agent CLIs; same cross-harness reach pattern from a different domain
Claude Managed Agents — composes with Graphify’s MCP server for production agentic-coding deployments
OpenSpec — spec-first companion; OpenSpec defines should-exist, Graphify shows does-exist (via graph topology, not symbol traversal)
Shopping for Skills and Plugins — license / commercial-fit / publisher checklist (the 45k-star marketing-signal calibration is exactly what this checklist is for)
Six Best Claude Code Skills for Business — vetting framework; useful baseline before installing
The Expanding Toolkit (Lucas) — code-execution + tool-use layer below the Graphify integration point
Connections — cross-topic synthesis index

Jonathon's AI Wiki

Explorer

Graphify — Cross-Harness Knowledge-Graph Skill (Tree-Sitter + LLM Hybrid)

Key Takeaways

How it compares to GitNexus

Where this fits in the wiki

Implementation

Open Questions

Try It

Graph View

Table of Contents

Backlinks

Jonathon's AI Wiki

Explorer

Graphify — Cross-Harness Knowledge-Graph Skill (Tree-Sitter + LLM Hybrid)

Key Takeaways

How it compares to GitNexus

Where this fits in the wiki

Implementation

Open Questions

Try It

Related

Graph View

Table of Contents

Backlinks