Z AI (Zhipu AI) - Cline

Z AI (formerly Zhipu AI) offers the GLM model series, featuring hybrid reasoning capabilities and agentic AI design. These models excel in unified reasoning, coding, and intelligent agent applications while maintaining open-source accessibility under MIT license.

Website: https://z.ai/model-api (International) | https://open.bigmodel.cn/ (China)

Getting an API Key

International Users

Sign Up/Sign In: Go to https://z.ai/model-api. Create an account or sign in.
Navigate to API Keys: Access your account dashboard and find the API keys section.
Create a Key: Generate a new API key for your application.
Copy the Key: Copy the API key immediately and store it securely.

China Mainland Users

Sign Up/Sign In: Go to https://open.bigmodel.cn/. Create an account or sign in.
Navigate to API Keys: Access your account dashboard and find the API keys section.
Create a Key: Generate a new API key for your application.
Copy the Key: Copy the API key immediately and store it securely.

Supported Models

Z AI provides different model catalogs based on your selected region. Both regions share the same model lineup:

GLM-5.1 (Latest)

glm-5.1 (Default) - Latest flagship model with 200K context window, 128K maximum output, and prompt caching ($1.40/$4.40 per 1M tokens; cached input $0.26 per 1M tokens)

GLM-5

glm-5 - Flagship model with 200K context window and prompt caching ($1.00/$3.20 per 1M tokens)

GLM-4.7

glm-4.7 - High-performance model with 200K context and prompt caching ($0.60/$2.20 per 1M tokens)

GLM-4.6

glm-4.6 - Advanced model with 200K context and prompt caching ($0.60/$2.20 per 1M tokens)

GLM-4.5 Series

glm-4.5 - Flagship model with 131K context, prompt caching, and hybrid reasoning
glm-4.5-air - Compact, cost-effective model with 128K context and prompt caching

All models feature:

Prompt caching support for reduced costs on repeated queries
Mixture of Experts (MoE) architecture for optimal performance
Agent-native design integrating reasoning, coding, and tool usage
Open-source availability under MIT license

Note: Pricing differs between International and China regions. China region pricing is approximately 50% lower.

Configuration in Cline

Open Cline Settings: Click the settings icon (⚙️) in the Cline panel.
Select Provider: Choose "Z AI" from the "API Provider" dropdown.
Select Region: Choose your region:
- "International" for global access
- "China" for mainland China access
Enter API Key: Paste your Z AI API key into the "Z AI API Key" field.
Select Model: Choose your desired model from the "Model" dropdown.

GLM Coding Plans

Z AI offers subscription plans specifically designed for coding applications. These plans provide cost-effective access to GLM-4.5 models through a prompt-based structure rather than traditional API usage billing.

Plan Options

GLM Coding Lite - $3/month

120 prompts per 5-hour cycle
Access to GLM-4.5 model
Works exclusively through coding tools like Cline

GLM Coding Pro - $15/month

600 prompts per 5-hour cycle
Access to GLM-4.5 model
Works exclusively through coding tools like Cline

Both plans offer promotional pricing for the first month: Lite drops from $6 to $3, Pro drops from $30 to $15.

Setting up GLM Coding Plans

To use the GLM Coding Plans with Cline:

Subscribe: Go to https://z.ai/subscribe and choose your plan.
Create API Key: After subscribing, log into your zAI dashboard and create an API key for your coding plan.
Configure in Cline: Open Cline settings, select "Z AI" as your provider, and paste your API key into the "Z AI API Key" field.

The setup connects your subscription directly to Cline, giving you access to GLM-4.5's tool-calling capabilities optimized for coding workflows.

Z AI's Hybrid Intelligence

Z AI's GLM-4.5 series introduces revolutionary capabilities that set it apart from conventional language models:

Hybrid Reasoning Architecture

GLM-4.5 operates in two distinct modes:

Thinking Mode: Designed for complex reasoning tasks and tool usage, engaging in deeper analytical processes
Non-Thinking Mode: Provides immediate responses for straightforward queries, optimizing efficiency

This dual-mode architecture represents an "agent-native" design philosophy that adapts processing intensity based on query complexity.

Exceptional Performance

GLM-4.5 achieves a comprehensive score of 63.2 across 12 benchmarks spanning agentic tasks, reasoning, and coding challenges, securing 3rd place among all proprietary and open-source models. GLM-4.5-Air maintains competitive performance with a score of 59.8 while delivering superior efficiency.

Mixture of Experts Excellence

The sophisticated MoE architecture optimizes performance while maintaining computational efficiency:

GLM-4.5: 355B total parameters with 32B active parameters
GLM-4.5-Air: 106B total parameters with 12B active parameters

Extended Context Capabilities

The 128,000-token context window enables comprehensive understanding of lengthy documents and codebases, with real-world testing confirming effective processing of nearly 2,000-line codebases while maintaining remarkable performance.

Open-Source Leadership

Released under MIT license, GLM-4.5 provides researchers and developers with access to state-of-the-art capabilities without proprietary restrictions, including base models, hybrid reasoning versions, and optimized FP8 variants.

Regional Optimization

API Endpoints

International: Uses https://api.z.ai/api/paas/v4
China: Uses https://open.bigmodel.cn/api/paas/v4

Model Availability

The region setting determines both API endpoint and available models, with automatic filtering to ensure compatibility with your selected region.

Special Features

Agentic Capabilities

GLM-4.5's unified architecture makes it particularly suitable for complex intelligent agent applications requiring integrated reasoning, coding, and tool utilization capabilities.

Comprehensive Benchmarking

Performance evaluation encompasses:

3 agentic task benchmarks
7 reasoning benchmarks
2 coding benchmarks

This comprehensive assessment demonstrates versatility across diverse AI applications.

Developer Integration

Models support integration through multiple frameworks:

transformers
vLLM
SGLang

Complete with dedicated model code, tool parser, and reasoning parser implementations.

Performance Comparisons

vs Claude 4 Sonnet

GLM-4.5 shows competitive performance in agentic coding and reasoning tasks, though Claude Sonnet 4 maintains advantages in coding success rates and autonomous multi-feature application development.

vs GPT-4.5

GLM-4.5 ranks competitively in reasoning and agent benchmarks, with GPT-4.5 generally leading in raw task accuracy on professional benchmarks like MMLU and AIME.

Tips and Notes

Region Selection: Choose the appropriate region for optimal performance and compliance with local regulations.
Model Selection: GLM-4.5 for maximum performance, GLM-4.5-Air for efficiency and mainstream hardware compatibility.
Context Advantage: Large 128K context window enables processing of substantial codebases and documents.
Open Source Benefits: MIT license enables both commercial use and secondary development.
Agentic Applications: Particularly strong for applications requiring reasoning, coding, and tool usage integration.
Hybrid Reasoning: Use Thinking Mode for complex problems, Non-Thinking Mode for simple queries.
API Compatibility: OpenAI-compatible API provides streaming responses and usage reporting.
Framework Support: Multiple integration options available for different deployment scenarios.