docs/source/guide/persistent_storage.md
Set up persistent storage so that uploaded task data, user images, and other media are kept across restarts and available to all Label Studio components.
!!! note By default, Label Studio uses nginx to serve uploaded media. For persistent storage to work correctly, run nginx alongside Label Studio. This is the recommended setup and reduces load on the Label Studio server.
If you use a minimal setup without nginx, you can set `USE_NGINX_FOR_UPLOADS=false` and `USE_NGINX_FOR_EXPORT_DOWNLOADS=false`. In that case, Label Studio serves all uploads and exports. This is not recommended: it increases load on the uwsgi workers and can cause outages when users work with large files.
Choose one of the following according to your deployment:
When you deploy Label Studio on Kubernetes, you can use a Persistent Volume Claim (PVC) for persistent storage instead of cloud object storage. The PVC must support ReadWriteMany access mode so that multiple pods (for example, the app and rqworker) can read and write the same volume.
Configure persistence in your ls-values.yaml using the persistence.config.volume section and set accessModes to ReadWriteMany:
global:
persistence:
enabled: true
type: volume
config:
volume:
size: 50Gi
accessModes:
- ReadWriteMany
!!! note
Your cluster must provide a StorageClass or default storage that supports ReadWriteMany (for example, NFS, EFS, Azure Files, or other network-backed storage). Standard block storage (e.g. many default cloud disks) typically supports only ReadWriteOnce and is not suitable when multiple pods need to mount the same PVC.
Set up Amazon S3 as the persistent storage for Label Studio hosted in AWS or using Docker Compose.
Start by creating an S3 bucket following the Amazon Simple Storage Service User Guide steps.
!!! note If you want to secure the data stored in the S3 bucket at rest, you can set up default server-side encryption for Amazon S3 buckets following the steps in the Amazon Simple Storage Service User Guide.
!!! note In the case if you're going to use direct file upload feature and store media files like audio, video, csv you should complete this step.
Set up Cross-Origin Resource Sharing (CORS) access to your bucket. See Configuring cross-origin resource sharing (CORS) in the Amazon S3 User Guide. Use or modify the following example:
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET",
"PUT",
"POST",
"DELETE",
"HEAD"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": [
"x-amz-server-side-encryption",
"x-amz-request-id",
"x-amz-id-2"
],
"MaxAgeSeconds": 3600
}
]
After you create an S3 bucket, set up the necessary IAM permissions to grant Label Studio access to your bucket. There are four ways that you can manage access to your S3 bucket:
Select the relevant tab and follow the steps for your desired option:
<div class="code-tabs"> <div data-name="IAM role (OIDC)">!!! note To set up an IAM role using this method, you must have a configured and provisioned OIDC provider for your cluster. See Create an IAM OIDC provider for your cluster in the Amazon EKS User Guide.
<YOUR_S3_BUCKET> with the name of your bucket:{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<YOUR_S3_BUCKET>"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::<YOUR_S3_BUCKET>/*"
]
}
]
}
sts.amazonaws.com as the Audience.ls-values.yaml file.
Optionally, you can choose a folder by specifying folder (default is "" or omit this argument):global:
persistence:
enabled: true
type: s3
config:
s3:
bucket: "<YOUR_BUCKET_NAME>"
region: "<YOUR_BUCKET_REGION>"
folder: ""
app:
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME_FROM_STEP_3>
rqworker:
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME_FROM_STEP_3>
<YOUR_S3_BUCKET> with the name of your bucket:{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<YOUR_S3_BUCKET>"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::<YOUR_S3_BUCKET>/*"
]
}
]
}
ls-values.yaml file with your newly-created access key ID and secret key as <YOUR_ACCESS_KEY_ID> and <YOUR_SECRET_ACCESS_KEY>.
Optionally, you can choose a folder by specifying folder (default is "" or omit this argument):global:
persistence:
enabled: true
type: s3
config:
s3:
accessKey: "<YOUR_ACCESS_KEY_ID>"
secretKey: "<YOUR_SECRET_ACCESS_KEY>"
bucket: "<YOUR_BUCKET_NAME>"
region: "<YOUR_BUCKET_REGION>"
folder: ""
!!! note Optionally, you can use already existing Kubernetes secret and a key.
kubectl create secret generic <YOUR_SECRET_NAME> --from-literal=accesskey=<YOUR_ACCESS_KEY_ID> --from-literal=secretkey=<YOUR_SECRET_ACCESS_KEY>
ls-values.yaml file with your newly-created kubernetes secret:global:
persistence:
enabled: true
type: s3
config:
s3:
accessKeyExistingSecret: "<YOUR_SECRET_NAME>"
accessKeyExistingSecretKey: "accesskey"
secretKeyExistingSecret: "<YOUR_SECRET_NAME>"
secretKeyExistingSecretKey: "secretkey"
bucket: "<YOUR_BUCKET_NAME>"
region: "<YOUR_BUCKET_REGION>"
To create an IAM role without using OIDC in EKS, follow these steps.
YOUR_CLUSTER_NAME > Node Group.YOUR_NODE_GROUP with Label Studio deployed.<YOUR_S3_BUCKET> with the name of your bucket:{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<YOUR_S3_BUCKET>"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::<YOUR_S3_BUCKET>/*"
]
}
]
}
ls-values.yaml file.
Optionally, you can choose a folder by specifying folder (default is "" or omit this argument):global:
persistence:
enabled: true
type: s3
config:
s3:
bucket: "<YOUR_BUCKET_NAME>"
region: "<YOUR_BUCKET_REGION>"
folder: ""
<YOUR_S3_BUCKET> with the name of your bucket:{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<YOUR_S3_BUCKET>"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::<YOUR_S3_BUCKET>/*"
]
}
]
}
env.list file, replacing <YOUR_ACCESS_KEY_ID> and <YOUR_SECRET_ACCESS_KEY> with your newly-created access key ID and secret key. Optionally, you can specify a folder using STORAGE_AWS_FOLDER (default is "" or omit this argument):STORAGE_TYPE=s3
STORAGE_AWS_ACCESS_KEY_ID="<YOUR_ACCESS_KEY_ID>"
STORAGE_AWS_SECRET_ACCESS_KEY="<YOUR_SECRET_ACCESS_KEY>"
STORAGE_AWS_BUCKET_NAME="<YOUR_BUCKET_NAME>"
STORAGE_AWS_REGION_NAME="<YOUR_BUCKET_REGION>"
STORAGE_AWS_FOLDER=""
Set up Google Cloud Storage (GCS) as the persistent storage for Label Studio hosted in Google Cloud Platform (GCP) or Docker Compose.
heartex-example-bucket-123456.NameStarts withprojects/_/buckets/heartex-example-bucket-123456resource.name.startsWith('projects/_/buckets/heartex-example-bucket-123456'). See CEL for Conditions in Overview of IAM Conditions in the Google Cloud Storage guide.!!! note In the case if you're going to use direct file upload feature and store media files like audio, video, csv you should complete this step.
Set up CORS access to your bucket. See Configuring cross-origin resource sharing (CORS) in the Google Cloud User Guide. Use or modify the following example:
echo '[
{
"origin": ["*"],
"method": ["GET","PUT","POST","DELETE","HEAD"],
"responseHeader": ["Content-Type","Access-Control-Allow-Origin"],
"maxAgeSeconds": 3600
}
]' > cors-config.json
Replace YOUR_BUCKET_NAME with your actual bucket name in the following command to update CORS for your bucket:
gsutil cors set cors-config.json gs://YOUR_BUCKET_NAME
You can connect Label Studio to your GCS bucket using Workload Identity or Access keys.
After you create a bucket and set up IAM permissions, connect Label Studio to your GCS bucket. There are three ways that you can connect to your bucket:
!!! note Make sure that Workload Identity is enabled on your GKE cluster and that you meet the necessary prerequisites. See Using Workload Identity in the Google Kubernetes Engine guide.
GCP_SA variable, and replacing the other references in <> as needed:GCP_SA=<Service-Account-You-Created>
APP_SA="serviceAccount:<GCP_PROJECT_ID>.svc.id.goog[<K8S_NAMESPACE>/<HELM_RELEASE_NAME>-lse-app]"
WORKER_SA="serviceAccount:<GCP_PROJECT_ID>.svc.id.goog[<K8S_NAMESPACE>/<HELM_RELEASE_NAME>-lse-rqworker]"
gcloud iam service-accounts add-iam-policy-binding ${GCP_SA} \
--role roles/iam.workloadIdentityUser \
--member "${APP_SA}"
gcloud iam service-accounts add-iam-policy-binding ${GCP_SA} \
--role roles/iam.workloadIdentityUser \
--member "${WORKER_SA}"
ls-values.yaml file to include the values for the service account and other configurations. Update the projectID, bucket, and replace the<GCP_SERVICE_ACCOUNT> with the relevant values for your deployment.
Optionally, you can choose a folder by specifying folder (default is "" or omit this argument):global:
persistence:
enabled: true
type: gcs
config:
gcs:
projectID: "<YOUR_PROJECT_ID>"
bucket: "<YOUR_BUCKET_NAME>"
folder: ""
app:
serviceAccount:
annotations:
iam.gke.io/gcp-service-account: "<GCP_SERVICE_ACCOUNT>"
rqworker:
serviceAccount:
annotations:
iam.gke.io/gcp-service-account: "<GCP_SERVICE_ACCOUNT>"
You can use a service account key that you create, or if you already have a Kubernetes secret and key, follow the steps below to use those.
Create a service account key from the UI and download the JSON. Follow the steps for Creating and managing service account keys in the Google Cloud Identity and Access Management guide.
After downloading the JSON for the service account key, update or create references to the JSON, your projectID, and your bucket in your ls-values.yaml file.
Optionally, you can choose a folder by specifying folder (default is "" or omit this argument):
global:
persistence:
enabled: true
type: gcs
config:
gcs:
projectID: "<YOUR_PROJECT_ID>"
applicationCredentialsJSON: "<YOUR_JSON>"
bucket: "<YOUR_BUCKET_NAME>"
folder: ""
<PATH_TO_JSON> with the path to the service account JSON file:kubectl create secret generic <YOUR_SECRET_NAME> --from-file=key_json=<PATH_TO_JSON>
ls-values.yaml file with your newly-created Kubernetes secret:global:
persistence:
enabled: true
type: gcs
config:
gcs:
projectID: "<YOUR_PROJECT_ID>"
applicationCredentialsJSONExistingSecret: "<YOUR_SECRET_NAME>"
applicationCredentialsJSONExistingSecretKey: "key_json"
bucket: "<YOUR_BUCKET_NAME>"
env.list file.
Optionally, you can choose a folder by specifying STORAGE_GCS_FOLDER (default is "" or omit this argument):STORAGE_TYPE=gcs
STORAGE_GCS_BUCKET_NAME="<YOUR_BUCKET_NAME>"
STORAGE_GCS_PROJECT_ID="<YOUR_PROJECT_ID>"
STORAGE_GCS_FOLDER=""
GOOGLE_APPLICATION_CREDENTIALS="/opt/heartex/secrets/key.json"
Place the downloaded JSON file from step 1 in the same directory as your env.list file.
Append the following entry in docker-compose.yml file as the path for app.volumes:
- ./service-account-file.json:/opt/heartex/secrets/key.json:ro
Create a Microsoft Azure Storage container to use as persistent storage with Label Studio.
!!! note
Make sure that you set Stock Keeping Unit (SKU) to Premium_LRS and the kind parameter to BlockBlobStorage. This configuration results in storage that uses solid state drives (SSDs) rather than standard hard disk drives (HDDs). If you set this parameter to an HDD-based storage option, your instance might be too slow and could malfunction.
az storage account keys list --account-name=${STORAGE_ACCOUNT}
az storage container create --name <YOUR_CONTAINER_NAME> \
--account-name <YOUR_STORAGE_ACCOUNT> \
--account-key "<YOUR_STORAGE_KEY>"
!!! note In the case if you're going to use direct file upload feature and store media files like audio, video, csv you should complete this step.
Set up CORS access to your bucket. See Configuring cross-origin resource sharing (CORS) in the Azure User Guide. Use or modify the following example:
<Cors>
<CorsRule>
<AllowedOrigins>*</AllowedOrigins>
<AllowedMethods>GET,PUT,POST,DELETE,HEAD</AllowedMethods>
<AllowedHeaders>x-ms-blob-content-type</AllowedHeaders>
<ExposedHeaders>x-ms-*</ExposedHeaders>
<MaxAgeInSeconds>3600</MaxAgeInSeconds>
</CorsRule>
<Cors>
You can connect Label Studio to your Azure container using account keys in Kubernetes or account keys in Docker Compose. Choose the option relevant to your Label Studio deployment.
<div class="code-tabs"> <div data-name="Kubernetes">Update your ls-values.yaml file with the YOUR_CONTAINER_NAME, YOUR_STORAGE_ACCOUNT, and YOUR_STORAGE_KEY that you created.
Optionally, you can choose a folder by specifying folder (default is "" or omit this argument):
global:
persistence:
enabled: true
type: azure
config:
azure:
storageAccountName: "<YOUR_STORAGE_ACCOUNT>"
storageAccountKey: "<YOUR_STORAGE_KEY>"
containerName: "<YOUR_CONTAINER_NAME>"
folder: ""
If you have an existing key, you can use that instead to create a Kubernetes secret.
kubectl create secret generic <YOUR_SECRET_NAME> --from-literal=storageaccountname=<YOUR_STORAGE_ACCOUNT> --from-literal=storageaccountkey=<YOUR_STORAGE_KEY>
ls-values.yaml file with your newly-created Kubernetes secret:global:
persistence:
enabled: true
type: azure
config:
azure:
storageAccountNameExistingSecret: "<YOUR_SECRET_NAME>"
storageAccountNameExistingSecretKey: "storageaccountname"
storageAccountKeyExistingSecret: "<YOUR_SECRET_NAME>"
storageAccountKeyExistingSecretKey: "storageaccountkey"
containerName: "<YOUR_CONTAINER_NAME>"
Update your env.list file with the YOUR_CONTAINER_NAME, YOUR_STORAGE_ACCOUNT, and YOUR_STORAGE_KEY that you created.
Optionally, you can choose a folder by specifying STORAGE_AZURE_FOLDER (default is "" or omit this argument):
STORAGE_TYPE=azure
STORAGE_AZURE_ACCOUNT_NAME="<YOUR_STORAGE_ACCOUNT>"
STORAGE_AZURE_ACCOUNT_KEY="<YOUR_STORAGE_KEY>"
STORAGE_AZURE_CONTAINER_NAME="<YOUR_CONTAINER_NAME>"
STORAGE_AZURE_FOLDER=""