site/content/en/docs/annotation/auto-annotation/segment-anything-2-tracker.md
Segment Anything 2 is a segmentation model that allows fast and precise selection of any object in videos or images. SAM2 tracking is available in two implementations:
Nuclio SAM2 Tracker: Available only for Enterprise deployments. This is implemented as a serverless function deployed via Nuclio framework.
AI Agent SAM2 Tracker: Available for CVAT Online and Enterprise via auto-annotation (AA) functions that run on user-side agents. This brings SAM2 tracking capabilities to CVAT Online users who previously couldn't access this feature.
It is strongly recommended to deploy the model using a GPU. Although it is possible to use a CPU-based version, it generally performs much slower and is suitable only for handling a single parallel request. The AI agent variant runs on user hardware, providing flexibility for GPU usage without server configuration requirements.
Unlike a regular tracking model, both SAM2 tracker implementations are designed to be applied to existing objects (polygons and masks) to track them forward for a specified number of frames.
Choose the installation method based on your platform and deployment needs.
{{% alert title="Note" color="primary" %}} Nuclio SAM2 Tracker is only available in the Enterprise version. The AI agent variant brings SAM2 tracking to CVAT Online and Enterprise. {{% /alert %}}
{{% alert title="Note" color="primary" %}} Both tracker implementations require the enhanced actions UI plugin, which is enabled by default. Usually, no additional steps are necessary on this. {{% /alert %}}
You can use existing scripts from the community repository
(./serverless/deploy_cpu.sh or ./serverless/deploy_gpu.sh).
To deploy the feature, simply run:
./serverless/deploy_gpu.sh "path/to/the/function"
cvat_redis_ondisk uses.
When running the nuclio deploy command, make sure to provide the necessary arguments.
The minimal command is:nuctl deploy "path/to/the/function"
--env CVAT_FUNCTIONS_REDIS_HOST="<redis_host>"
--env CVAT_FUNCTIONS_REDIS_PORT="<redis_port>"
--env CVAT_FUNCTIONS_REDIS_PASSWORD="<redis_password>" # if applicable
The AI agent implementation enables SAM2 tracking for CVAT Online users and provides an alternative deployment method for Enterprise customers. This approach runs the tracking model on user hardware via auto-annotation (AA) functions. Deploy SAM2 using Docker Compose with pre-built images for a quick and straightforward setup.
The easiest way to deploy SAM2 with pre-built images from Docker Hub.
Clone the CVAT repository and navigate to the SAM2 agent directory:
git clone https://github.com/cvat-ai/cvat.git
cd cvat/ai-models/agents_deployment/sam2
Create or update the .env file with your configuration.
For GPU deployment, set IMAGE_URL=cvat/sam2_agent:latest_GPU and COMPOSE_PROFILES=gpu.
For CPU-only deployment, set IMAGE_URL=cvat/sam2_agent:latest and COMPOSE_PROFILES=cpu.
Configure the remaining required variables (CVAT_BASE_URL, CVAT_ACCESS_TOKEN, FUNCTION_NAME, etc.)
following the corresponding.
Start the agent:
docker compose up
Verify the agent is running in the CVAT interface.
You should see a new function model named <FUNCTION_NAME> in the list on the /models page
and in the annotation actions list.
To stop and clean up:
# Deregister the function from CVAT (must be called before volume removed)
# Alternatively, you can always remove the function from CVAT interface
docker compose run --rm cvat-function-deregister
# Stop the agent and remove volumes
docker compose down -v
For detailed configuration options and troubleshooting, see the Docker Compose agent guide.
{{% alert title="Note" color="info" %}} For enterprise deployments using Kubernetes, refer to the Docker Compose agent guide for Kubernetes deployment instructions and container orchestration examples. {{% /alert %}}
The AI agent runs as a persistent process on your hardware, providing several advantages:
{{% alert title="Important" color="warning" %}} Keep the agent process running to handle tracking requests. If the agent stops, active tracking operations will fail and need to be restarted. {{% /alert %}}
Both SAM2 tracker implementations provide similar user experiences with slight differences in the UI labels.
The nuclio tracker can be applied to any polygons and masks. To run the tracker on an object, open the object menu and click Run annotation action.
Alternatively, you can use a hotkey: select the object and press Ctrl + E (default shortcut). When the modal opened, in "Select action" list, choose Segment Anything 2: Tracker:
Once you have registered the SAM2 AI agent and it's running, you'll see "AI Tracker: SAM2" as an available action in the annotation UI for video shape tracking.
To use the AI agent tracker:
The usage flow parallels the existing annotation action interface but utilizes the remote AI agent rather than built-in serverless functions.
Specify the target frame until which you want the object to be tracked, then click the Run button to start tracking. The process begins and may take some time to complete. The duration depends on the inference device, and the number of frames where the object will be tracked.
Once the process is complete, the modal window closes. You can review how the object was tracked. If you notice that the tracked shape deteriorates at some point, you can adjust the object coordinates and run the tracker again from that frame.
Instead of tracking each object individually, you can track multiple objects simultaneously. To do this, click the Menu button in the annotation view and select the Run Actions option:
Alternatively, you can use a hotkey: just press Ctrl + E (default shortcut) when there are no objects selected. This opens the actions modal. In this case, the tracker will be applied to all visible objects of suitable types (polygons and masks). In the action list of the opened modal, select either:
Specify the target frame until which you want the objects to be tracked, then click the Run button to start tracking. The process begins and may take some time to complete. The duration depends on the inference device, the number of simultaneously tracked objects, and the number of frames where the objects will be tracked.
Once the process finishes, you may close the modal and review how the objects were tracked. If you notice that the tracked shapes deteriorate, you can adjust their coordinates and run the tracker again from that frame (for a single object or for many objects).
When using the AI agent implementation, keep in mind:
cvat-cli task auto-annotate command for tracking.