Source: Safi Shamsi Graphify README v7 (2026-05-09) (github.com/safishamsi/graphify)
Graphify is Safi Shamsi’s cross-harness knowledge-graph skill that turns any folder of code, docs, PDFs, images, or videos into a queryable graph. One command (/graphify . in 18+ AI coding assistants, or graphify install to wire it up) produces three artifacts: graph.html (interactive viewer), GRAPH_REPORT.md (summary with “god nodes” + surprising connections + suggested questions), and graph.json (queryable serialized graph). Tree-sitter AST extraction across 28 languages runs locally with zero API calls; non-code modalities (docs/PDFs/images/video) route through your assistant’s model. Optional MCP server exposes query_graph / get_node / get_neighbors / shortest_path for repeated structured access. MIT license, Python 3.10+, PyPI graphifyy (double-y). 45,493 stars / 4,930 forks at fetch (2026-05-09); created 2026-04-03 — five weeks old. Sister project: Penpax, a commercial always-on layer that applies the same graph approach to meetings/email/browser/files (closed beta, waitlist).
Key Takeaways
- One-command universal skill.
/graphify .in any of 18 AI coding assistants — Claude Code, Codex, OpenCode, Cursor, Gemini CLI, GitHub Copilot CLI, VS Code Copilot Chat, Aider, OpenClaw, Factory Droid, Trae, Trae CN, Hermes, Kimi Code, Kiro, Pi, Google Antigravity. Same install path, same output, same skill artifact wrapped per-harness. - Three output artifacts, deliberately separated.
graph.html— interactive browser viewer (click nodes, filter, search)GRAPH_REPORT.md— the highlights layer: god nodes, surprising connections, 4-5 suggested questions, confidence tags (EXTRACTED/INFERRED/AMBIGUOUS)graph.json— full serialized graph for query-without-re-parsing The deliberate structure: humans read the report, agents read the JSON, both can fall back to the HTML. Pattern is similar to QMD’s “index → query → fetch” separation but on a graph topology rather than text retrieval.
- Tree-sitter AST across 28 languages, locally. Python, TypeScript, Go, Rust, Java, C/C++, Ruby, C#, Kotlin, Scala, PHP, Swift, Lua, Luau, Zig, PowerShell, Elixir, Objective-C, Julia, Vue, Svelte, Groovy/Gradle, SQL, Fortran (.f/.f90/.f95/.f03/.f08),
.jsx,.tsx. No API calls for code — tree-sitter handles all 28. The substantive engineering claim of the project. - Non-code modalities go through your LLM. Docs (
.md/.mdx/.qmd/.html/.txt/.rst/.yaml), Office (.docx/.xlsxwith[office]extra), PDFs, images (.png/.jpg/.webp/.gif), video/audio (with[video]extra), YouTube URLs. Watch the cost line. A repo with 200 PDFs gets 200 LLM calls; the AST extraction is free, the document extraction is not. /graphifyskill vsgraphify extractheadless. Two operating modes. The skill (interactive, in your IDE) routes through whatever model your session runs — same model that’s already loaded answers the report-generation prompts. The headless mode (graphify extract) runs without an IDE and requires explicit--backend gemini|kimi|claude|openai|ollama|bedrockplus the corresponding API key. Bedrock is keyless (uses AWS IAM credential chain). For CI integration, headless + Bedrock is the cleanest path.- Six output formats from one extraction.
--obsidian(Obsidian vault),--wiki(agent-crawlable markdown wiki),--svg,--graphml(Gephi/yEd),--neo4j(Cypher dump),--neo4j-push(direct push to Bolt). Plus the default HTML/JSON. Means the graph is portable into existing tooling — not a graphify-only artifact. - Incremental rebuilds via git hook.
graphify hook installadds post-commit + post-checkout hooks that re-parse only changed files (AST-only — no API cost). Plus a git merge driver forgraph.jsonso two devs committing in parallel get their graphs union-merged automatically. The merge driver is a non-trivial engineering detail — typical graph-export tools don’t ship with merge drivers and parallel commits would corrupt the JSON. - Cross-project global graph.
graphify extract ./docs --global --as myreporegisters a project’s graph into~/.graphify/global.json.graphify global listshows all registered repos with node/edge counts. The cross-project graph is the actual moat — once you’ve indexed three or four projects, the global graph can answer “what concepts repeat across my work?” which a single-repo grep never can. - MCP server is optional, not required.
python -m graphify.serve graphify-out/graph.jsonexposes structured tools (query_graph,get_node,get_neighbors,shortest_path) for repeated agent access without re-parsing. For one-off code-walking, the CLIgraphify queryis enough; for an agent that needs to traverse the graph repeatedly during a session, the MCP server avoids re-loading the graph each call. Codex parity: Codex requiresmulti_agent = truein~/.codex/config.tomland uses$graphifyinstead of/graphify. - Privacy claim is partial, not absolute. Code stays local (tree-sitter); video/audio is transcribed locally with
faster-whisper. Docs, PDFs, and images transmit to your LLM provider’s API. No telemetry. The privacy framing is honest but the practical scope is “code stays local; document semantics do not.” - MIT license — clean for commercial use. Unlike GitNexus (PolyForm-Noncommercial, requires commercial license via akonlabs.com for agency deployment), Graphify is MIT. For WEO Marketly client work or any commercial agency deployment, this is the license-cleaner of the two.
- The 45.5k-star number is a marketing artifact, not a quality signal. Created 2026-04-03; fetched 2026-05-09 — five weeks old. 45,493 stars / 4,930 forks for that age is well outside organic-growth distribution. Author ships a Gumroad book (The Memory Layer), Graphify Labs domain, sponsor button, X account, LinkedIn presence, 30+ translated READMEs. Treat the star count as a marketing signal; evaluate the engineering on its own. The 28-language tree-sitter coverage and the git merge driver are real engineering; the 18-platform install matrix is one SKILL.md wrapped 18 ways.
- Penpax is the commercial play. Penpax (waitlist) extends the same graph approach to the user’s entire working life — meetings, browser history, emails, files, code — updating continuously, on-device. The OSS Graphify is the loss-leader; Penpax is the SaaS. Worth knowing when evaluating long-term project incentives.
How it compares to GitNexus
The closest in-scope wiki article is GitNexus (Abhigyan Patwari / akonlabs). Both target “structural code understanding via knowledge graph + agent integration.” Direct comparison:
| Dimension | Graphify (safishamsi) | GitNexus (abhigyanpatwari) |
|---|---|---|
| License | MIT | PolyForm-Noncommercial (commercial license required for agency / company deployment) |
| Stars / age at fetch | 45,493 / 5 weeks (created 2026-04-03) | 37,048 / 9 months (created Aug 2025) |
| Stack | Python (tree-sitter for code, LLM for non-code) | TypeScript (tree-sitter, LadybugDB graph storage, browser WebAssembly) |
| Languages | 28 (Python, TS/JS, Go, Rust, Java, C/C++, Ruby, C#, Kotlin, Scala, PHP, Swift, Lua, Zig, PowerShell, Elixir, Julia, Vue, Svelte, Groovy/Gradle, SQL, Fortran, etc.) | 14+ (TypeScript, Python, Java, Go, Rust explicitly named) |
| Beyond code | Yes (docs, PDFs, Office, images, video, YouTube URLs) — routes through LLM | Code-focused; no native PDF/image/video pipeline |
| MCP tools | 4 (query_graph, get_node, get_neighbors, shortest_path) | 16 (hybrid search, impact analysis, symbol context, multi-file rename, dependency graph, call chain, cluster, execution flow, etc.) |
| Web UI | None (HTML viewer is local, not hosted) | Yes — hosted at gitnexus.vercel.app, browser-based WebAssembly indexing |
| Cross-harness install | 18 AI assistants (Claude Code, Codex, Cursor, Gemini, Copilot, Aider, OpenClaw, Hermes, Antigravity, etc.) | 2 first-class (Claude Code plugin + Cursor integration), CLI/MCP for others |
| Cross-project graph | Yes — ~/.graphify/global.json with graphify global list | No (per-repo only) |
| Incremental updates | Git hooks (post-commit + post-checkout, AST-only) + merge driver for graph.json | Six-phase pipeline; incremental story not as explicit |
| Storage format | JSON (portable; also exports to GraphML, Neo4j Cypher, Obsidian) | LadybugDB graph database (specialized) |
| Output artifact | 3 files (HTML + Markdown report + JSON) | Knowledge graph in browser/CLI; agent queries via MCP |
| Commercial layer | Penpax (separate product, waitlist) | akonlabs.com (commercial-license route for the OSS itself) |
Practical decision rule for this wiki’s user base:
- Commercial / agency / WEO Marketly client work: Graphify (MIT, no licensing friction).
- Personal / academic / non-commercial: Either is free; pick on technical fit.
- Need 16 MCP tools including impact analysis + multi-file rename: GitNexus (richer MCP surface).
- Need PDF/image/video extraction in the same graph as code: Graphify (single pipeline for mixed corpora).
- Need cross-project graph that spans your whole workspace: Graphify (
globalsubcommand). - Need browser-based / no-install / drop-a-zip workflow: GitNexus (gitnexus.vercel.app).
Where this fits in the wiki
- Code-intelligence layer. Sits next to GitNexus in the same niche; the two articles together form the canonical comparison for “knowledge-graph-over-codebase + agent integration” decisions.
- Cross-harness skill pattern. Joins Everything Claude Code, The Agency, and last30days-skill in the “single skill / repo, many AI coding assistants” pattern. The 18-platform install matrix is the same pattern those projects use.
- Hermes-platform support is named.
graphify install --platform hermesandgraphify hermes install— the Hermes harness is a first-class target. For Hermes users this means the skill is wired without manual config translation. - Composes with Managed Agents orchestration — a Managed Agent could call Graphify’s MCP server (
query_graph) to traverse a codebase mid-task. - Adjacent to synthadoc — same idea (build an indexed knowledge structure and let an agent query it), but synthadoc indexes documents into chunked retrieval; Graphify indexes code+docs into a graph topology. Different retrieval geometry for the same problem.
- Pairs with Open Design’s 15-CLI auto-detection pattern — both projects acknowledge the cross-harness reality and ship one product that wires into many.
- Different use case than QMD (the wiki’s own retrieval layer). QMD = hybrid BM25+vector+LLM-rerank over markdown text (semantic retrieval). Graphify = entity-relationship graph extracted via tree-sitter AST (structural topology). Same word “knowledge graph,” different jobs. For this user’s karpathy wiki, QMD is the right tool; for the user’s
~/Auto1111/hermes-agent/codebase, Graphify is the right tool.
Implementation
- Tool/Service: Graphify (safishamsi/graphify v7) — Python skill + CLI for cross-harness knowledge-graph extraction.
- Setup:
- Install:
uv tool install graphifyy && graphify install(or pipx / pip equivalents). PyPI package isgraphifyy(double-y); CLI command isgraphify. - Per-harness wiring:
graphify installfor Claude Code (Linux/Mac);graphify install --platform <harness>orgraphify <harness> installfor the other 17 assistants. Codex requiresmulti_agent = truein~/.codex/config.toml. - First run:
cd <project>then/graphify .(orgraphify .in PowerShell). Outputs tographify-out/. - Optional persistent agent integration:
graphify <harness> installwrites a config that makes the assistant readGRAPH_REPORT.mdbefore answering codebase questions; on Claude Code, Codex, and Gemini CLI a hook fires before every file-read call. - Optional MCP server:
python -m graphify.serve graphify-out/graph.jsonfor repeated structured query access.
- Install:
- Cost:
- Code extraction — free (tree-sitter local, zero API calls).
- Doc/PDF/image extraction — paid against your AI assistant’s model API (or your provided backend key for headless mode).
- Video/audio transcription — free (
faster-whisperruns locally; install viapip install graphifyy[video]). - The Python package itself is MIT, no fee.
- Integration notes:
- MIT license — clean for commercial / agency / WEO Marketly client deployment without licensing friction.
- 28 code languages via tree-sitter (substantive coverage, not just the top 5).
graphify-out/is meant to be committed to git so teammates inherit the graph;graphify-out/manifest.jsonandgraphify-out/cost.jsonshould be.gitignored (mtime-based / local-only).graphify hook installadds a git merge driver — parallel commits tograph.jsonget union-merged automatically. Non-trivial engineering detail; rare in the OSS-graph-export space.--updateflag is the daily ergonomic — re-extracts only changed files, keeps API cost minimal.- Cross-project:
--global --as <name>registers a graph into~/.graphify/global.json;graphify global listenumerates registered repos. The global graph is queryable across repos viagraphify queryagainst the merged JSON. - Headless backends:
gemini,kimi,claude,openai,ollama(local — setOLLAMA_BASE_URL/OLLAMA_MODEL),bedrock(keyless — uses AWS IAM credential chain). Bedrock + CI is the cleanest credentials story for an agency. - Privacy: code + video stay local; docs/PDFs/images transmit to LLM provider. No telemetry.
Open Questions
- The 45.5k-star claim. Five-week-old repo with 45k stars. What’s the actual engagement metric? Pepy.tech downloads (referenced via PyPI badge) would be the more honest signal — worth checking before recommending widely.
- Penpax incentive structure. Graphify is MIT; Penpax (the always-on commercial layer) is the SaaS. How aggressively does the OSS pull users toward the commercial product over time? Worth watching the README diff history for “now hosted on Penpax” framing creeping in.
- LLM cost ceiling on document-heavy corpora. A repo with 200 PDFs = 200 LLM calls per full extraction. The
--updateflag mitigates this but doesn’t eliminate the first-run cost. Concrete numbers: per-PDF token cost (roughly 5k-50k tokens depending on length), compounded across the corpus. The README doesn’t quote total-cost benchmarks for typical repos. - Cross-project graph quality.
~/.graphify/global.jsonclaim is interesting but unevaluated in the README. Does the union-merged graph actually surface useful cross-project patterns, or does naming-collision / context-loss degrade it past 4-5 repos? - MCP tool surface vs GitNexus. Graphify’s 4 MCP tools (
query_graph,get_node,get_neighbors,shortest_path) vs GitNexus’s 16 (impact analysis, multi-file rename, etc.). For agentic refactoring, which surface gives the agent enough to actually act, not just inspect? Worth a head-to-head trial. - Tree-sitter language quality at the long-tail. 28 languages claimed. Languages like Luau, Zig, Julia, Fortran (multiple variants) have less mature tree-sitter parsers than Python or TypeScript. How well do AST extraction + clustering hold up on those?
--neo4j-pushsecurity model. Pushing to a Bolt endpoint requires credentials — how are they passed? README doesn’t say. Production teams will need to know before exposing a graph DB to an unattended skill.- Penpax data flow. “On-device, no cloud” is the Penpax claim per the graphifylabs.ai marketing. The OSS Graphify routes documents to your LLM provider — does Penpax do the same, or does it ship a local model? README doesn’t disclose; the marketing site presumably will.
Try It
- Cheapest valid first run.
pip install graphifyy && graphify install. Pick a small Python project,cdinto it, rungraphify .(PowerShell users) or/graphify .(Claude Code session). Opengraphify-out/graph.htmlin a browser. Total time: under 5 minutes for a 50-file repo. No LLM cost if there are no docs/PDFs in the project (code-only is tree-sitter local). - Hermes / OmniPresence trial. This user’s most plausible high-value target:
~/Auto1111/hermes-agent/(multi-language: TS frontend + Python backend per global CLAUDE.md). Rungraphify .once and assess whetherGRAPH_REPORT.mdis keepable as onboarding documentation. The cross-language extraction is the actual pitch for this codebase. - Skip the karpathy wiki. This vault is 99% markdown; tree-sitter has nothing structural to extract from
.mdbeyond what wikilinks already give you. QMD already handles wiki retrieval. Running Graphify here would chew API tokens to rephrase what you already have. - Compare to GitNexus head-to-head. Pick the same codebase. Run both. Open the report files side-by-side. Note where Graphify’s 28-language coverage helps; note where GitNexus’s 16-MCP-tool surface (impact analysis, multi-file rename) wins. The comparison is more useful than either tool’s marketing.
- Wire MCP into Claude Code.
python -m graphify.serve graphify-out/graph.json, then in a Claude Code session ask “what connects<entity-A>to<entity-B>?” — usesshortest_pathdirectly. Compare againstgrep + Read + manual traversalfor the same question. Note where the MCP path saves tokens vs where the grep path is faster. - Add the git hook.
graphify hook install— post-commit AST rebuild costs zero API tokens; the merge driver protects parallel-commit integrity. Recommended even for solo developers as future-proofing if collaboration grows. - For commercial / WEO Marketly client engagements, Graphify’s MIT license clears the licensing question that GitNexus’s PolyForm-NC raises. If license is the load-bearing dimension, Graphify wins by default.
- Watch for Penpax availability. graphifylabs.ai — the always-on commercial layer that extends the graph approach to meetings/email/browser/files. Currently waitlist. Worth tracking via the watchlist if the user wants the same pattern beyond code.
Related
- GitNexus — Zero-Server Code Intelligence Engine with Graph RAG — direct competitor; the comparison table above is the load-bearing piece
- synthadoc — analogous “indexed knowledge structure + agent query” pattern for documents (chunked retrieval rather than graph topology)
- Everything Claude Code (ECC) — same cross-harness-skill pattern; ECC is agents/skills/MCP bundles, Graphify is single-purpose graph
- agency-agents) — another cross-harness install matrix (11+ AI assistants) with one skill, many wrappers
- last30days-skill — single-purpose skill with cross-harness reach; built by the same author (Matt Van Horn) as the Printing Press CLI factory
- Open Design (nexu-io) — auto-detects 15 coding-agent CLIs; same cross-harness reach pattern from a different domain
- Claude Managed Agents — composes with Graphify’s MCP server for production agentic-coding deployments
- OpenSpec — spec-first companion; OpenSpec defines should-exist, Graphify shows does-exist (via graph topology, not symbol traversal)
- Shopping for Skills and Plugins — license / commercial-fit / publisher checklist (the 45k-star marketing-signal calibration is exactly what this checklist is for)
- Six Best Claude Code Skills for Business — vetting framework; useful baseline before installing
- The Expanding Toolkit (Lucas) — code-execution + tool-use layer below the Graphify integration point
- Connections — cross-topic synthesis index