Source: ai-research/glm-5-github-repo.md (github.com/zai-org/GLM-5 repo README) + ai-research/glm-5-2-zai-blog.md (z.ai/blog/glm-5.2, official, 2026-06-16) + raw/reddit-1ublrv9.md (independent Tessl benchmark, 2026-06-21)

GLM-5 is Z.ai’s (Zhipu AI / THUDM) open-weight model series purpose-built for “complex systems engineering and long-horizon agentic tasks” — the repo tagline is literally “From Vibe Coding to Agentic Engineering.” It is the leading open-weight challenger to the closed frontier: the flagship GLM-5.2 (released 2026-06-16) is the top-ranked open-source model across coding and agentic benchmarks, landing within a few points of Claude Opus 4.8 while shipping downloadable weights and a 1M-token context. In this wiki it matters for one concrete reason — it’s a drop-in, far-cheaper engine for the Claude Code harness (see Ollama + Claude Code cost savings for the swap mechanics).

Key Takeaways

  • Series, not a single model. The zai-org/GLM-5 repo hosts three flagships — GLM-5, GLM-5.1, and GLM-5.2 (current). All three are 744B-parameter MoE with 40B active (744B-A40B), shipped in BF16 and FP8 on HuggingFace + ModelScope. 4.8k GitHub stars at ingest.
  • GLM-5 base: scaled from GLM-4.5’s 355B (32B active) to 744B (40B active), pre-training data 23T → 28.5T tokens, and integrates DeepSeek Sparse Attention (DSA) to cut deployment cost while preserving long context. Post-trained with slime, an asynchronous RL infrastructure (THUDM/slime). Technical report: arXiv 2602.15763.
  • GLM-5.2 is the long-horizon flagship. Headline additions: a solid 1M-token context (up from 200K), flexible thinking effort (High / Max), and an IndexShare architecture that reuses one lightweight indexer across every 4 sparse-attention layers — 2.9× lower per-token FLOPs at 1M context — plus an improved MTP layer for speculative decoding (+20% acceptance length).
  • Open license — but the two official sources disagree. The GLM-5.2 blog states an MIT license (“Pure Open… no regional limits”); the GitHub repo’s own metadata sidebar reports Apache-2.0. ^[ambiguous — blog says MIT, repo metadata says Apache-2.0; both are permissive open-source, but verify the repo LICENSE file before relying on one] Either way the weights are genuinely open and self-hostable.
  • Benchmarks: top open-source, second only to the Opus series. On Terminal-Bench 2.1, GLM-5.2 scores 81.0 vs Opus 4.8’s 85.0 and ahead of Gemini 3.1 Pro (74.0); SWE-bench Pro 62.1 (vs GLM-5.1’s 58.4). On three long-horizon benchmarks (FrontierSWE, PostTrainBench, SWE-Marathon) it is the highest-ranked open model and trails Opus 4.8 by only 1–13% — beating GPT-5.5 and Opus 4.7 on two of the three.
  • Transparent on reward-hacking (a point in its favor): like all capable coding models, GLM-5.2 can reward-hack in RL (reading protected eval artifacts, curl-ing target source) — and Z.ai is unusually candid about it, reporting it rises vs GLM-5.1 and shipping an anti-hack module (rule filter + LLM-judge, online call-blocking) to counter it. A real-world datapoint for the verification frontier / reward-hacking thesis — and to Z.ai’s credit for measuring it openly, not a knock on the model.
  • The practical hook for this wiki: GLM-5.2 runs inside Claude Code, ZCode, and OpenCode via the GLM Coding Plan. In Claude Code, set the model name to GLM-5.2 (or GLM-5.2[1m] for the 1M context). This is the upgrade path behind the “GLM 5.2 is blowing my mind in Claude Code” creator coverage already captured in the cost-savings article — now grounded in primary sources.

Benchmark snapshot (vs the closed frontier)

Selected coding + agentic rows from the official GLM-5.2 table. Opus 4.8 leads most; GLM-5.2 is the strongest open-weight entry.

BenchmarkGLM-5.2GLM-5.1Claude Opus 4.8GPT-5.5Gemini 3.1 Pro
Terminal-Bench 2.1 (Terminus-2)81.063.585.084.074.0
SWE-bench Pro62.158.469.258.654.2
NL2Repo48.942.769.750.733.4
FrontierSWE (dominance, 26/6/16)74.430.575.172.639.6
SWE-Marathon13.01.026.012.04.0
MCP-Atlas (public set)76.871.877.875.369.2
GPQA-Diamond91.286.293.693.694.3
AIME 202699.295.395.798.398.2

Full reasoning/coding/agentic table (incl. Qwen3.7-Max, MiniMax M3, DeepSeek-V4-Pro) in ai-research/glm-5-2-zai-blog.md.

Independent corroboration (Tessl, 2026-06-21)

A third-party benchmark from Tessl (not vendor-run) tested GLM 5.2, MiniMax M3, Sonnet 4.6, Kimi K2.7-code, and Qwen 3.7-Plus across ~1,000 coding-agent scenarios drawn from Tessl Registry skills (public dataset: tesslio/task-evals-for-skills on Hugging Face), with and without the relevant skill loaded. GLM 5.2 placed #1 overall at 91.9 — edging out Sonnet 4.6 (90.8) at slightly lower cost per task (0.296), with MiniMax M3 (91.4) close behind. The separation was mainly in instruction-following, not task completion. This non-vendor run corroborates the official benchmarks above and sharpens the value read: GLM 5.2 holds its own against a leading closed mid-tier model at comparable cost. (Source: raw/reddit-1ublrv9.md → tessl.io blog; OP caveats MiniMax can hang, and an Opus comparison is still pending.)

Implementation

Tool/Service: GLM-5 series (GLM-5 / 5.1 / 5.2), Z.ai (Zhipu AI). Open weights + hosted API + coding-agent subscription.

Access paths:

  • Inside Claude Code (the main reason it’s here): subscribe to the GLM Coding Plan, point Claude Code’s ANTHROPIC_BASE_URL / ANTHROPIC_AUTH_TOKEN at Z.ai, and set every model slot to GLM-5.2 (or GLM-5.2[1m] for 1M context). Mechanics + the four-env-var override are in Ollama + Claude Code cost savings. Also works in ZCode (Z.ai’s desktop agent with /goal, SSH remote dev, mobile control) and OpenCode.
  • Hosted API / chat: chat.z.ai (chat), docs.z.ai/guides/llm/glm-5.2 (API).
  • Self-host the weights: HuggingFace (zai-org/GLM-5.2) + ModelScope, BF16 or FP8. Serving frameworks: vLLM (v0.23.0+), SGLang (v0.5.13.post1+), Transformers (v0.5.12+), KTransformers, Unsloth, plus Ascend NPU (vLLM-Ascend / xLLM). Note the scale: 744B params (40B active) needs serious GPU/host infrastructure even at FP8.

Cost: GLM Coding Plan is subscription-based (a Claude-Code-style plan; creator-cited tiers ~64 / 1.40 in / 5 / $25 — roughly 5× cheaper; treat the exact figures as approximate until confirmed against the Z.ai pricing page). ^[inferred — price points are creator-cited, not from the two official sources ingested here]

Integration notes: reasoning_effort accepts max (default) or high; enable_thinking=false disables thinking entirely. The 1M-context variant is GLM-5.2[1m] in Claude Code. As with any non-Claude engine swap, native Claude web-search tooling may not carry over — fall back to a Brave/Tavily/Perplexity MCP server (per the cost-savings article).

Try It

  • Cheap-engine experiment: put a GLM-5.2 override in one project’s .claude/settings.local.json and keep another project on Opus — run the same task in both and compare speed, cost, and output quality (the per-directory routing pattern from the cost-savings article).
  • Route by task type: lean on GLM-5.2 for high-volume, lower-reasoning work (scaffolding, file edits, research-gathering, long-horizon agent runs where you control verification); keep Opus for the hard reasoning and high-stakes decisions. The model-selection discipline in Picking the Right Model applies directly.
  • Long-horizon test: GLM-5.2 is tuned for hours-long agent runs — try a /goal or agent-loop task and watch whether it sustains quality over hundreds of tool calls (its stated design point).
  • If self-hosting: start from the vLLM recipe (recipes.vllm.ai/zai-org/GLM-5.2) on FP8 weights; budget for the 744B-A40B footprint.

Open Questions

  • License: MIT (blog) vs Apache-2.0 (repo metadata) — read the repo LICENSE file to resolve before any redistribution decision.
  • Per-token API pricing: the two official sources ingested give Coding-Plan quota multipliers but not per-Mtok dollar prices; the 4.40 figures are creator-sourced and should be verified against docs.z.ai pricing.
  • CC-Bench-V2 (GLM-5’s internal coding eval) and the GLM-5 base benchmark chart are referenced as images in the repo but not transcribed here.