docs/docs/integrations/language-models/zhipu-ai.md
ZhiPu AI is a platform to provide model service including text generation, text embedding, image generation and so on. You can refer to ZhiPu AI Open Platform for more details. LangChain4j integrates with ZhiPu AI by using HTTP endpoint. We are consider migrating it from HTTP endpoint to official SDK and are appreciated of any help!
You can use ZhiPu AI with LangChain4j in plain Java or Spring Boot applications.
:::note
Since 1.0.0-alpha1, langchain4j-zhipu-ai has migrated to langchain4j-community and is renamed to
langchain4j-community-zhipu-ai
:::
Before 1.0.0-alpha1:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-zhipu-ai</artifactId>
<version>${previous version here}</version>
</dependency>
1.0.0-alpha1 and later:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-community-zhipu-ai</artifactId>
<version>${latest version here}</version>
</dependency>
Or, you can use BOM to manage dependencies consistently:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-community-bom</artifactId>
<version>${latest version here}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
ZhipuAiChatModelZhipuAiChatModel has following parameters to configure when you initialize it:
| Property | Description | Default Value |
|---|---|---|
| baseUrl | The URL to connect to. You can use HTTP or websocket to connect to DashScope | https://open.bigmodel.cn/ |
| apiKey | The API Key | |
| model | The model to use. | glm-4-flash |
| topP | The probability threshold for kernel sampling controls the diversity of texts generated by the model. the higher the top_p, the more diverse the generated texts, and vice versa. Value range: (0, 1.0]. We generally recommend altering this or temperature but not both. | |
| maxRetries | The maximum retry times to request | 3 |
| temperature | Sampling temperature that controls the diversity of the text generated by the model. the higher the temperature, the more diverse the generated text, and vice versa. Value range: [0, 2) | 0.7 |
| stops | With the stop parameter, the model will automatically stop generating text when it is about to contain the specified string or token_id. | |
| maxToken | The maximum number of tokens returned by this request. | 512 |
| listeners | Listeners that listen for request, response and errors. | |
| callTimeout | OKHttp timeout config for request | |
| connectTimeout | OKHttp timeout config for request | |
| writeTimeout | OKHttp timeout config for request | |
| readTimeout | OKHttp timeout config for request | |
| logRequests | Whether to log request or not | false |
| logResponses | Whether to log response or not | false |
| doSample | Whether to use sampling. When set to false, the model will use greedy decoding | |
| toolStream | Whether to enable partial tool streaming. When set to true, tool calls can be streamed incrementally | false |
ZhipuAiChatRequestParametersZhipuAiChatRequestParameters can be used to configure additional parameters when sending a chat request:
| Property | Description | Default Value |
|---|---|---|
| doSample | Whether to use sampling. When set to false, the model will use greedy decoding | |
| toolStream | Whether to enable partial tool streaming. When set to true, tool calls can be streamed incrementally | false |
| thinking | Configuration for reasoning mode. type specifies the reasoning type, clearThinking controls whether to show the internal thinking process in the response |
ZhipuAiStreamingChatModelSame as ZhipuAiChatModel, except maxRetries.
You can initialize ZhipuAiChatModel by using following code:
ChatModel model = ZhipuAiChatModel.builder()
.apiKey("You API key here")
.callTimeout(Duration.ofSeconds(60))
.connectTimeout(Duration.ofSeconds(60))
.writeTimeout(Duration.ofSeconds(60))
.readTimeout(Duration.ofSeconds(60))
.build();
Or more custom for other parameters:
ChatModel model = ZhipuAiChatModel.builder()
.apiKey("You API key here")
.model("glm-4")
.temperature(0.6)
.maxToken(1024)
.maxRetries(2)
.callTimeout(Duration.ofSeconds(60))
.connectTimeout(Duration.ofSeconds(60))
.writeTimeout(Duration.ofSeconds(60))
.readTimeout(Duration.ofSeconds(60))
.build();
You can enable reasoning mode to get the model's internal thinking process:
ChatModel model = ZhipuAiChatModel.builder()
.apiKey("You API key here")
.model(ChatCompletionModel.GLM_4_7) // Use GLM-4-5 or upper model for reasoning support
.build();
ChatResponse response = model.chat(
ChatRequest.builder()
.messages(UserMessage.from("What is the capital of Germany?"))
.parameters(ZhipuAiChatRequestParameters.builder()
.thinking(Thinking.builder()
.type("reasoning")
.clearThinking(true)
.build())
.build())
.build());
AiMessage aiMessage = response.aiMessage();
System.out.println("Answer: "+aiMessage.text());
System.out.println("Thinking: "+aiMessage.thinking());
You can stream partial tool calls incrementally using toolStream:
ZhipuAiStreamingChatModel model = ZhipuAiStreamingChatModel.builder()
.apiKey("You API key here")
.model(ChatCompletionModel.GLM_4_7)
.build();
ToolSpecification calculator = ToolSpecification.builder()
.name("calculator")
.description("returns a sum of two numbers")
.parameters(JsonObjectSchema.builder()
.addIntegerProperty("first")
.addIntegerProperty("second")
.build())
.build();
TestStreamingChatResponseHandler handler = new TestStreamingChatResponseHandler() {
@Override
public void onPartialToolCall(ToolExecutionRequest partialToolCall) {
System.out.println("Partial tool call: " + partialToolCall.name() + " - " + partialToolCall.arguments());
}
};
model.chat(
ChatRequest.builder()
.messages(UserMessage.from("2+2=?"))
.parameters(ZhipuAiChatRequestParameters.builder()
.toolSpecifications(calculator)
.toolStream(true)
.build())
.build(),
handler);
You can check more examples in: