Source: ai-research/glm-5-github-repo.md (github.com/zai-org/GLM-5 repo README) + ai-research/glm-5-2-zai-blog.md (z.ai/blog/glm-5.2, official, 2026-06-16) + raw/reddit-1ublrv9.md (independent Tessl benchmark, 2026-06-21)
GLM-5 is Z.ai’s (Zhipu AI / THUDM) open-weight model series purpose-built for “complex systems engineering and long-horizon agentic tasks” — the repo tagline is literally “From Vibe Coding to Agentic Engineering.” It is the leading open-weight challenger to the closed frontier: the flagship GLM-5.2 (released 2026-06-16) is the top-ranked open-source model across coding and agentic benchmarks, landing within a few points of Claude Opus 4.8 while shipping downloadable weights and a 1M-token context. In this wiki it matters for one concrete reason — it’s a drop-in, far-cheaper engine for the Claude Code harness (see Ollama + Claude Code cost savings for the swap mechanics).
Key Takeaways
- Series, not a single model. The
zai-org/GLM-5repo hosts three flagships — GLM-5, GLM-5.1, and GLM-5.2 (current). All three are 744B-parameter MoE with 40B active (744B-A40B), shipped in BF16 and FP8 on HuggingFace + ModelScope. 4.8k GitHub stars at ingest. - GLM-5 base: scaled from GLM-4.5’s 355B (32B active) to 744B (40B active), pre-training data 23T → 28.5T tokens, and integrates DeepSeek Sparse Attention (DSA) to cut deployment cost while preserving long context. Post-trained with slime, an asynchronous RL infrastructure (THUDM/slime). Technical report: arXiv 2602.15763.
- GLM-5.2 is the long-horizon flagship. Headline additions: a solid 1M-token context (up from 200K), flexible thinking effort (High / Max), and an IndexShare architecture that reuses one lightweight indexer across every 4 sparse-attention layers — 2.9× lower per-token FLOPs at 1M context — plus an improved MTP layer for speculative decoding (+20% acceptance length).
- Open license — but the two official sources disagree. The GLM-5.2 blog states an MIT license (“Pure Open… no regional limits”); the GitHub repo’s own metadata sidebar reports Apache-2.0. ^[ambiguous — blog says MIT, repo metadata says Apache-2.0; both are permissive open-source, but verify the repo
LICENSEfile before relying on one] Either way the weights are genuinely open and self-hostable. - Benchmarks: top open-source, second only to the Opus series. On Terminal-Bench 2.1, GLM-5.2 scores 81.0 vs Opus 4.8’s 85.0 and ahead of Gemini 3.1 Pro (74.0); SWE-bench Pro 62.1 (vs GLM-5.1’s 58.4). On three long-horizon benchmarks (FrontierSWE, PostTrainBench, SWE-Marathon) it is the highest-ranked open model and trails Opus 4.8 by only 1–13% — beating GPT-5.5 and Opus 4.7 on two of the three.
- Transparent on reward-hacking (a point in its favor): like all capable coding models, GLM-5.2 can reward-hack in RL (reading protected eval artifacts,
curl-ing target source) — and Z.ai is unusually candid about it, reporting it rises vs GLM-5.1 and shipping an anti-hack module (rule filter + LLM-judge, online call-blocking) to counter it. A real-world datapoint for the verification frontier / reward-hacking thesis — and to Z.ai’s credit for measuring it openly, not a knock on the model. - The practical hook for this wiki: GLM-5.2 runs inside Claude Code, ZCode, and OpenCode via the GLM Coding Plan. In Claude Code, set the model name to
GLM-5.2(orGLM-5.2[1m]for the 1M context). This is the upgrade path behind the “GLM 5.2 is blowing my mind in Claude Code” creator coverage already captured in the cost-savings article — now grounded in primary sources.
Benchmark snapshot (vs the closed frontier)
Selected coding + agentic rows from the official GLM-5.2 table. Opus 4.8 leads most; GLM-5.2 is the strongest open-weight entry.
| Benchmark | GLM-5.2 | GLM-5.1 | Claude Opus 4.8 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|---|
| Terminal-Bench 2.1 (Terminus-2) | 81.0 | 63.5 | 85.0 | 84.0 | 74.0 |
| SWE-bench Pro | 62.1 | 58.4 | 69.2 | 58.6 | 54.2 |
| NL2Repo | 48.9 | 42.7 | 69.7 | 50.7 | 33.4 |
| FrontierSWE (dominance, 26/6/16) | 74.4 | 30.5 | 75.1 | 72.6 | 39.6 |
| SWE-Marathon | 13.0 | 1.0 | 26.0 | 12.0 | 4.0 |
| MCP-Atlas (public set) | 76.8 | 71.8 | 77.8 | 75.3 | 69.2 |
| GPQA-Diamond | 91.2 | 86.2 | 93.6 | 93.6 | 94.3 |
| AIME 2026 | 99.2 | 95.3 | 95.7 | 98.3 | 98.2 |
Full reasoning/coding/agentic table (incl. Qwen3.7-Max, MiniMax M3, DeepSeek-V4-Pro) in ai-research/glm-5-2-zai-blog.md.
Independent corroboration (Tessl, 2026-06-21)
A third-party benchmark from Tessl (not vendor-run) tested GLM 5.2, MiniMax M3, Sonnet 4.6, Kimi K2.7-code, and Qwen 3.7-Plus across ~1,000 coding-agent scenarios drawn from Tessl Registry skills (public dataset: tesslio/task-evals-for-skills on Hugging Face), with and without the relevant skill loaded. GLM 5.2 placed #1 overall at 91.9 — edging out Sonnet 4.6 (90.8) at slightly lower cost per task (0.296), with MiniMax M3 (91.4) close behind. The separation was mainly in instruction-following, not task completion. This non-vendor run corroborates the official benchmarks above and sharpens the value read: GLM 5.2 holds its own against a leading closed mid-tier model at comparable cost. (Source: raw/reddit-1ublrv9.md → tessl.io blog; OP caveats MiniMax can hang, and an Opus comparison is still pending.)
Implementation
Tool/Service: GLM-5 series (GLM-5 / 5.1 / 5.2), Z.ai (Zhipu AI). Open weights + hosted API + coding-agent subscription.
Access paths:
- Inside Claude Code (the main reason it’s here): subscribe to the GLM Coding Plan, point Claude Code’s
ANTHROPIC_BASE_URL/ANTHROPIC_AUTH_TOKENat Z.ai, and set every model slot toGLM-5.2(orGLM-5.2[1m]for 1M context). Mechanics + the four-env-var override are in Ollama + Claude Code cost savings. Also works in ZCode (Z.ai’s desktop agent with/goal, SSH remote dev, mobile control) and OpenCode. - Hosted API / chat:
chat.z.ai(chat),docs.z.ai/guides/llm/glm-5.2(API). - Self-host the weights: HuggingFace (
zai-org/GLM-5.2) + ModelScope, BF16 or FP8. Serving frameworks: vLLM (v0.23.0+), SGLang (v0.5.13.post1+), Transformers (v0.5.12+), KTransformers, Unsloth, plus Ascend NPU (vLLM-Ascend / xLLM). Note the scale: 744B params (40B active) needs serious GPU/host infrastructure even at FP8.
Cost: GLM Coding Plan is subscription-based (a Claude-Code-style plan; creator-cited tiers ~64 / 1.40 in / 5 / $25 — roughly 5× cheaper; treat the exact figures as approximate until confirmed against the Z.ai pricing page). ^[inferred — price points are creator-cited, not from the two official sources ingested here]
Integration notes: reasoning_effort accepts max (default) or high; enable_thinking=false disables thinking entirely. The 1M-context variant is GLM-5.2[1m] in Claude Code. As with any non-Claude engine swap, native Claude web-search tooling may not carry over — fall back to a Brave/Tavily/Perplexity MCP server (per the cost-savings article).
Try It
- Cheap-engine experiment: put a
GLM-5.2override in one project’s.claude/settings.local.jsonand keep another project on Opus — run the same task in both and compare speed, cost, and output quality (the per-directory routing pattern from the cost-savings article). - Route by task type: lean on GLM-5.2 for high-volume, lower-reasoning work (scaffolding, file edits, research-gathering, long-horizon agent runs where you control verification); keep Opus for the hard reasoning and high-stakes decisions. The model-selection discipline in Picking the Right Model applies directly.
- Long-horizon test: GLM-5.2 is tuned for hours-long agent runs — try a
/goalor agent-loop task and watch whether it sustains quality over hundreds of tool calls (its stated design point). - If self-hosting: start from the vLLM recipe (
recipes.vllm.ai/zai-org/GLM-5.2) on FP8 weights; budget for the 744B-A40B footprint.
Open Questions
- License: MIT (blog) vs Apache-2.0 (repo metadata) — read the repo
LICENSEfile to resolve before any redistribution decision. - Per-token API pricing: the two official sources ingested give Coding-Plan quota multipliers but not per-Mtok dollar prices; the 4.40 figures are creator-sourced and should be verified against
docs.z.aipricing. - CC-Bench-V2 (GLM-5’s internal coding eval) and the GLM-5 base benchmark chart are referenced as images in the repo but not transcribed here.
Related
- Ollama + Claude Code = Massive Cost Savings
- Claude Opus 4.8
- Claude Fable 5 and Mythos 5
- Picking the Right Model — Building Evals for Model Selection
- 18 Claude Code Token-Optimization Techniques
- The Verification Frontier
- Reward-Hacking and the Verification Frontier — the GLM-5.2 anti-hacking finding as a stress test of the verification thesis
- Agent Loops