examples/hello-concurrency/README.md
This is an example Cog project that demonstrates the newly added concurrency support within cog >= 0.14.0.
The key piece is the new concurrency field in the cog.yaml.
concurrency:
max: 4
This combined with the async setup and run methods in run.py allows Cog to run up to
4 concurrent predictions. If Cog reaches the max concurrency threshold it will reject subsequent
predictions with a 409 Conflict response.
It also uses the open-telemetry package to demonstrate how to collect telemetry for your model.
This requires a file named honeycomb_token.key to be included in the image build.
It will then start sending events to the cog-model data source. You can configure this by
editing the OTEL_SERVICE_NAME. If you use a custom endpoint this can be configured via OTEL_EXPORTER_OTLP_ENDPOINT.
Lastly, there is a section in run.py that can be uncommented to run telemetry locally and print events to the console for debugging.