Back to Cline

Z AI (Zhipu AI)

docs/provider-config/zai.mdx

3.82.08.4 KB
Original Source

Z AI (formerly Zhipu AI) offers the GLM model series, featuring hybrid reasoning capabilities and agentic AI design. These models excel in unified reasoning, coding, and intelligent agent applications while maintaining open-source accessibility under MIT license.

Website: https://z.ai/model-api (International) | https://open.bigmodel.cn/ (China)

Getting an API Key

International Users

  1. Sign Up/Sign In: Go to https://z.ai/model-api. Create an account or sign in.
  2. Navigate to API Keys: Access your account dashboard and find the API keys section.
  3. Create a Key: Generate a new API key for your application.
  4. Copy the Key: Copy the API key immediately and store it securely.

China Mainland Users

  1. Sign Up/Sign In: Go to https://open.bigmodel.cn/. Create an account or sign in.
  2. Navigate to API Keys: Access your account dashboard and find the API keys section.
  3. Create a Key: Generate a new API key for your application.
  4. Copy the Key: Copy the API key immediately and store it securely.

Supported Models

Z AI provides different model catalogs based on your selected region. Both regions share the same model lineup:

GLM-5.1 (Latest)

  • glm-5.1 (Default) - Latest flagship model with 200K context window, 128K maximum output, and prompt caching ($1.40/$4.40 per 1M tokens; cached input $0.26 per 1M tokens)

GLM-5

  • glm-5 - Flagship model with 200K context window and prompt caching ($1.00/$3.20 per 1M tokens)

GLM-4.7

  • glm-4.7 - High-performance model with 200K context and prompt caching ($0.60/$2.20 per 1M tokens)

GLM-4.6

  • glm-4.6 - Advanced model with 200K context and prompt caching ($0.60/$2.20 per 1M tokens)

GLM-4.5 Series

  • glm-4.5 - Flagship model with 131K context, prompt caching, and hybrid reasoning
  • glm-4.5-air - Compact, cost-effective model with 128K context and prompt caching

All models feature:

  • Prompt caching support for reduced costs on repeated queries
  • Mixture of Experts (MoE) architecture for optimal performance
  • Agent-native design integrating reasoning, coding, and tool usage
  • Open-source availability under MIT license

Note: Pricing differs between International and China regions. China region pricing is approximately 50% lower.

Configuration in Cline

  1. Open Cline Settings: Click the settings icon (⚙️) in the Cline panel.
  2. Select Provider: Choose "Z AI" from the "API Provider" dropdown.
  3. Select Region: Choose your region:
    • "International" for global access
    • "China" for mainland China access
  4. Enter API Key: Paste your Z AI API key into the "Z AI API Key" field.
  5. Select Model: Choose your desired model from the "Model" dropdown.

GLM Coding Plans

Z AI offers subscription plans specifically designed for coding applications. These plans provide cost-effective access to GLM-4.5 models through a prompt-based structure rather than traditional API usage billing.

Plan Options

GLM Coding Lite - $3/month

  • 120 prompts per 5-hour cycle
  • Access to GLM-4.5 model
  • Works exclusively through coding tools like Cline

GLM Coding Pro - $15/month

  • 600 prompts per 5-hour cycle
  • Access to GLM-4.5 model
  • Works exclusively through coding tools like Cline

Both plans offer promotional pricing for the first month: Lite drops from $6 to $3, Pro drops from $30 to $15.

<Frame> </Frame>

Setting up GLM Coding Plans

To use the GLM Coding Plans with Cline:

  1. Subscribe: Go to https://z.ai/subscribe and choose your plan.

  2. Create API Key: After subscribing, log into your zAI dashboard and create an API key for your coding plan.

  3. Configure in Cline: Open Cline settings, select "Z AI" as your provider, and paste your API key into the "Z AI API Key" field.

<Frame> </Frame>

The setup connects your subscription directly to Cline, giving you access to GLM-4.5's tool-calling capabilities optimized for coding workflows.

Z AI's Hybrid Intelligence

Z AI's GLM-4.5 series introduces revolutionary capabilities that set it apart from conventional language models:

Hybrid Reasoning Architecture

GLM-4.5 operates in two distinct modes:

  • Thinking Mode: Designed for complex reasoning tasks and tool usage, engaging in deeper analytical processes
  • Non-Thinking Mode: Provides immediate responses for straightforward queries, optimizing efficiency

This dual-mode architecture represents an "agent-native" design philosophy that adapts processing intensity based on query complexity.

Exceptional Performance

GLM-4.5 achieves a comprehensive score of 63.2 across 12 benchmarks spanning agentic tasks, reasoning, and coding challenges, securing 3rd place among all proprietary and open-source models. GLM-4.5-Air maintains competitive performance with a score of 59.8 while delivering superior efficiency.

Mixture of Experts Excellence

The sophisticated MoE architecture optimizes performance while maintaining computational efficiency:

  • GLM-4.5: 355B total parameters with 32B active parameters
  • GLM-4.5-Air: 106B total parameters with 12B active parameters

Extended Context Capabilities

The 128,000-token context window enables comprehensive understanding of lengthy documents and codebases, with real-world testing confirming effective processing of nearly 2,000-line codebases while maintaining remarkable performance.

Open-Source Leadership

Released under MIT license, GLM-4.5 provides researchers and developers with access to state-of-the-art capabilities without proprietary restrictions, including base models, hybrid reasoning versions, and optimized FP8 variants.

Regional Optimization

API Endpoints

  • International: Uses https://api.z.ai/api/paas/v4
  • China: Uses https://open.bigmodel.cn/api/paas/v4

Model Availability

The region setting determines both API endpoint and available models, with automatic filtering to ensure compatibility with your selected region.

Special Features

Agentic Capabilities

GLM-4.5's unified architecture makes it particularly suitable for complex intelligent agent applications requiring integrated reasoning, coding, and tool utilization capabilities.

Comprehensive Benchmarking

Performance evaluation encompasses:

  • 3 agentic task benchmarks
  • 7 reasoning benchmarks
  • 2 coding benchmarks

This comprehensive assessment demonstrates versatility across diverse AI applications.

Developer Integration

Models support integration through multiple frameworks:

  • transformers
  • vLLM
  • SGLang

Complete with dedicated model code, tool parser, and reasoning parser implementations.

Performance Comparisons

vs Claude 4 Sonnet

GLM-4.5 shows competitive performance in agentic coding and reasoning tasks, though Claude Sonnet 4 maintains advantages in coding success rates and autonomous multi-feature application development.

vs GPT-4.5

GLM-4.5 ranks competitively in reasoning and agent benchmarks, with GPT-4.5 generally leading in raw task accuracy on professional benchmarks like MMLU and AIME.

Tips and Notes

  • Region Selection: Choose the appropriate region for optimal performance and compliance with local regulations.
  • Model Selection: GLM-4.5 for maximum performance, GLM-4.5-Air for efficiency and mainstream hardware compatibility.
  • Context Advantage: Large 128K context window enables processing of substantial codebases and documents.
  • Open Source Benefits: MIT license enables both commercial use and secondary development.
  • Agentic Applications: Particularly strong for applications requiring reasoning, coding, and tool usage integration.
  • Hybrid Reasoning: Use Thinking Mode for complex problems, Non-Thinking Mode for simple queries.
  • API Compatibility: OpenAI-compatible API provides streaming responses and usage reporting.
  • Framework Support: Multiple integration options available for different deployment scenarios.