Zhipu AI has released GLM-4.5 and GLM-4.5-Air, two new AI models designed to handle reasoning, coding, and agent tasks within a single architecture. They use a dual-mode system to switch between complex problem-solving and faster responses, aiming to improve both accuracy and speed.
GLM-4.5 features 355B total parameters with 32B active, while its lighter sibling, GLM-4.5-Air, runs with 106B total and 12B active parameters. Both models use a Mixture-of-Experts (MoE) architecture and are optimized for two modes: a “thinking” mode for complex reasoning and tool use, and a “non-thinking” mode for fast responses.
GLM-4.5’s architecture prioritizes depth over width — a contrast to models like DeepSeek-V3 — and uses 96 attention heads per layer. It also incorporates QK-Norm, Grouped Query Attention, Multi-Token Prediction, and the Muon optimizer for faster convergence and improved reasoning performance.
Training was conducted on a 22T-token corpus, including 7T tokens dedicated to code and reasoning, followed by reinforcement learning with Zhipu AI’s in-house slime RL infrastructure. This setup features an asynchronous agentic RL training pipeline designed to maximize throughput and support long-horizon tasks.
Zhipu AI reports that GLM-4.5 ranks 3rd overall on a combined set of 12 benchmarks covering agentic tasks, reasoning, and coding, trailing only the very top models from OpenAI and Anthropic. GLM-4.5-Air ranks 6th, outperforming many models of similar or larger scale.
Source: Zhipu AI Blog
GLM-4.5 demonstrated particular strength in coding benchmarks. It achieved 64.2% on SWE-bench Verified and 37.5% on TerminalBench, placing it ahead of Claude 4 Opus, GPT-4.1, and Gemini 2.5 Pro on several metrics. Its tool-calling success rate reached 90.6%, outperforming Claude-4-Sonnet (89.5%) and Kimi K2 (86.2%).
Early testers have praised GLM-4.5’s coding and agentic capabilities. One Reddit user shared:
These models seem extremely good from my preliminary comparison. GLM-4.5 seems excellent at coding tasks, while GLM-4.5-Air seems even better than Qwen 3 235B-a22b 2507 on my agentic research and summarization benchmarks.
Another user commented on the GLM series’ speed and language proficiency:
GLM is pretty impressive. Didn’t try 4.5 yet, but 4.1 Thinking Flash scored around 150/200 on Scolarius in French language testing — one of the best in my personal 19 LLM comparison. Extremely fast too.
GLM-4.5 can be accessed directly via Z.ai, called through the Z.ai API, or integrated into existing coding agents like Claude Code or Roo Code. Model weights for local deployment are available on Hugging Face and ModelScope, with support for vLLM and SGLang inference frameworks.