sgl-model-gateway/bindings/golang/README.md
A high-level Go SDK for interacting with SGLang gRPC API, designed with an OpenAI-style API for familiarity and ease of use.
Location: sgl-model-gateway/bindings/golang/
go get github.com/sglang/sglang-go-grpc-sdk
cd sgl-model-gateway/bindings/golang
go mod tidy
Run the OpenAI-compatible server and benchmark:
# Set environment variables
export SGL_TOKENIZER_PATH="/Users/yangyanbo/tokenizer"
export SGL_GRPC_ENDPOINT="grpc://10.109.185.20:8001"
# Run server
cd examples/oai_server
bash run.sh
# Run E2E benchmark
cd ../..
make e2e E2E_MODEL=/work/models/qwencoder-3b E2E_TOKENIZER=/Users/yangyanbo/tokenizer E2E_INPUT_LEN=1024 E2E_OUTPUT_LEN=512
The SDK includes several examples in the examples/ directory:
# Run simple example
cd bindings/golang/examples/simple
bash run.sh
# Run streaming example
cd bindings/golang/examples/streaming
bash run.sh
# Or use Makefile from bindings/golang directory
cd bindings/golang
make run-simple
make run-streaming
package main
import (
"context"
"fmt"
"log"
"github.com/sglang/sglang-go-grpc-sdk"
)
func main() {
// Create client
client, err := sglang.NewClient(sglang.ClientConfig{
Endpoint: "grpc://localhost:20000",
TokenizerPath: "/path/to/tokenizer",
})
if err != nil {
log.Fatal(err)
}
defer client.Close()
// Create completion
resp, err := client.CreateChatCompletion(context.Background(), sglang.ChatCompletionRequest{
Model: "default",
Messages: []sglang.ChatMessage{
{Role: "user", Content: "Hello!"},
},
Stream: false,
})
if err != nil {
log.Fatal(err)
}
fmt.Println(resp.Choices[0].Message.Content)
fmt.Printf("Usage: Prompt=%d, Completion=%d, Total=%d\n",
resp.Usage.PromptTokens,
resp.Usage.CompletionTokens,
resp.Usage.TotalTokens)
}
package main
import (
"context"
"fmt"
"io"
"log"
"github.com/sglang/sglang-go-grpc-sdk"
)
func main() {
// Create client
client, err := sglang.NewClient(sglang.ClientConfig{
Endpoint: "grpc://localhost:20000",
TokenizerPath: "/path/to/tokenizer",
})
if err != nil {
log.Fatal(err)
}
defer client.Close()
// Create streaming completion
ctx := context.Background()
stream, err := client.CreateChatCompletionStream(ctx, sglang.ChatCompletionRequest{
Model: "default",
Messages: []sglang.ChatMessage{
{Role: "user", Content: "Tell me a story"},
},
Stream: true,
MaxCompletionTokens: intPtr(500),
})
if err != nil {
log.Fatal(err)
}
defer stream.Close()
// Read streaming response
for {
chunk, err := stream.Recv()
if err == io.EOF {
break
}
if err != nil {
log.Fatal(err)
}
for _, choice := range chunk.Choices {
if choice.Delta.Content != "" {
fmt.Print(choice.Delta.Content)
}
}
}
fmt.Println() // newline
}
// Helper functions for optional pointer fields
func intPtr(i int) *int {
return &i
}
func float32Ptr(f float32) *float32 {
return &f
}
Examples automatically detect the server endpoint and tokenizer path via environment variables or defaults.
SGL_GRPC_ENDPOINT: gRPC server endpoint (default: grpc://localhost:20000)SGL_TOKENIZER_PATH: Path to tokenizer directory (required)CARGO_BUILD_DIR: Rust build output directory (auto-detected if not set)type ClientConfig struct {
// Endpoint is the gRPC endpoint URL (e.g., "grpc://localhost:20000")
// Required field. Must include the scheme (grpc://) and port number.
Endpoint string
// TokenizerPath is the path to the tokenizer directory containing
// tokenizer configuration files (e.g., tokenizer.json, vocab.json)
// Required field.
TokenizerPath string
}
type Client struct {
// Thread-safe client for SGLang gRPC API
}
// Creates a new client with the given configuration
func NewClient(config ClientConfig) (*Client, error)
// Closes the client and releases all resources
func (c *Client) Close() error
// Creates a non-streaming chat completion
func (c *Client) CreateChatCompletion(ctx context.Context, req ChatCompletionRequest) (*ChatCompletionResponse, error)
// Creates a streaming chat completion
func (c *Client) CreateChatCompletionStream(ctx context.Context, req ChatCompletionRequest) (*ChatCompletionStream, error)
ChatCompletionRequest: Main request type for chat completions
ChatMessage: Individual message in a conversation
Tool: Tool/function definition for function calling
ChatCompletionResponse: Non-streaming response
ChatCompletionStreamResponse: Streaming response chunk
Message: Complete message with content and tool callsToolCall: Tool call information with function and argumentsUsage: Token usage statistics
The SDK includes comprehensive testing infrastructure with both unit and integration tests.
Unit tests are located in client_test.go and test individual components without requiring a server.
# Run all unit tests
go test ./...
# Run with verbose output
go test -v ./...
# Run specific test
go test -run TestClientConfig
# Run tests with race detector (detects concurrency issues)
go test -race ./...
# Run with coverage analysis
go test -cover ./...
# Generate detailed coverage report
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out -o coverage.html
client_test.go - 10 unit tests covering core functionalityIntegration tests require a running SGLang server and test the full client-server interaction.
python -m sglang.launch_server --model-path <model_path>export SGL_GRPC_ENDPOINT=grpc://localhost:20000
export SGL_TOKENIZER_PATH=/path/to/tokenizer
# Run all integration tests
go test -tags=integration ./...
# Run specific integration test
go test -tags=integration -run TestIntegrationNonStreamingCompletion
# Run with verbose output
go test -tags=integration -v ./...
# Run with race detector
go test -tags=integration -race ./...
Test File: integration_test.go - 4 integration tests
TestIntegrationNonStreamingCompletion - Basic non-streaming request/responseTestIntegrationStreamingCompletion - Streaming response handlingTestIntegrationConcurrentRequests - Multiple simultaneous requestsTestIntegrationContextCancellation - Context timeout and cancellationgo test -bench=. -benchmem ./...
All public types and functions include comprehensive documentation with usage examples.
Client - Main client with thread-safety notesClientConfig - Configuration requirements and validation rulesChatCompletionRequest - Request structure with field descriptionsChatCompletionResponse - Response structure and usageChatCompletionStreamResponse - Streaming response formatUsage - Token usage information structureTool, Function, ToolCall - Tool call structuresgodoc -http=:6060
# Visit: http://localhost:6060/pkg/github.com/sglang/sglang-go-grpc-sdk/
cd bindings/golang
make build # Build Go bindings
go vet ./... # Check code quality
go fmt ./... # Format code
go test -race ./... # Run tests
bindings/golang/
├── client.go # Main client implementation
├── client_test.go # Unit tests
├── integration_test.go # Integration tests
├── README.md # This file
├── Makefile # Build automation
├── Cargo.toml # Rust FFI dependencies
├── examples/ # Example programs
│ ├── simple/ # Non-streaming example
│ └── streaming/ # Streaming example
├── src/ # Rust FFI source
│ ├── client.rs # Client FFI
│ ├── stream.rs # Stream handling
│ ├── grpc_converter.rs # Response conversion
│ └── ...
└── internal/ # Internal packages
└── ffi/ # FFI bindings
Run go mod tidy to sync dependencies.
Ensure SGLang server is running and check SGL_GRPC_ENDPOINT.
Set SGL_TOKENIZER_PATH environment variable.
2. Verify path contains required files: ls $SGL_TOKENIZER_PATH
3. Files should include: tokenizer.json, vocab.json, config.json
Error: library 'sgl_model_gateway_go' not found
Solution:
cd sgl-model-gateway/bindings/golang && make buildcd sgl-model-gateway/bindings/golang && cargo build --releaseCARGO_BUILD_DIR if using non-standard build locationrustup toolchain listError: Tests seem to hang indefinitely
Solution:
timeout 30s go test ./...go test -v ./...grpcurl -plaintext localhost:20000 listError: Out of memory during tests
Solution:
# Run with memory limit for long-running tests
GODEBUG=madvdontneed=1 go test -timeout 5m ./...
# Monitor memory during tests
watch -n1 'ps aux | grep test'
When adding new features:
go vet and go test -raceSee LICENSE file for details.
Need Help?
examples/ directorygo test -v ./...godoc or inline comments