testing/fake-llm-server/README.md
A simple server that mimics the OpenAI streaming chat completions API for testing purposes.
npm install
Start the server:
# Development mode
npm run dev
# Production mode
npm run build
npm start
curl -X POST http://localhost:3500/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Say something"}],"model":"any-model","stream":true}'
The server will be available at http://localhost:3500 by default.
This endpoint mimics OpenAI's chat completions API.
{
"messages": [{ "role": "user", "content": "Your prompt here" }],
"model": "any-model",
"stream": true
}
stream: true to receive a streaming responsestream: false or omit it for a regular JSON responseFor non-streaming requests, you'll get a standard JSON response:
{
"id": "chatcmpl-123456789",
"object": "chat.completion",
"created": 1699000000,
"model": "fake-model",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "hello world"
},
"finish_reason": "stop"
}
]
}
For streaming requests, you'll receive a series of server-sent events (SSE), each containing a chunk of the response.
To test how your application handles rate limiting, send a message with content exactly equal to [429]:
{
"messages": [{ "role": "user", "content": "[429]" }],
"model": "any-model"
}
This will return a 429 status code with the following response:
{
"error": {
"message": "Too many requests. Please try again later.",
"type": "rate_limit_error",
"param": null,
"code": "rate_limit_exceeded"
}
}
You can configure the server by modifying the PORT variable in the code.
This server is primarily intended for testing applications that integrate with OpenAI's API, allowing you to develop and test without making actual API calls to OpenAI.