docs/source/guide/storage_databricks.md
In Label Studio Enterprise, you can connect to Databricks Unity Catalog (UC) volumes for your source and target storage.
For more information, see Databricks Files (UC Volumes) in our Enterprise documentation.
</div> <div class="enterprise-only">Connect Label Studio Enterprise to Databricks Unity Catalog (UC) Volumes to import files as tasks and export annotations as JSON back to your volumes. This connector uses the Databricks Files API and operates only in proxy mode (presigned URLs are not supported by Databricks).
A Databricks workspace URL (Workspace Host), for example https://adb-12345678901234.1.databricks.com (or Azure domain).
See Create a workspace and Get identifiers for workspace objects.
One of the following, all must have permissions to access the Files API:
A Databricks Personal Access Token (PAT)
A Databricks Service Principal
An Azure AD Service Principal for Azure Databricks
See Authentication options below.
A UC Volume path under /Volumes/<catalog>/<schema>/<volume> with files you want to label.
Proxying must be enabled for your organization (Organization > Usage & License > Features).
Before you begin, review the information in Cloud storage for projects and Secure access to cloud storage.
Traditional authentication using a long-lived token created in the Databricks workspace UI
You can generate tokens from your Databricks workspace under Account > Settings > Developer > Access tokens. See Databricks personal access token authentication.
When configuring storage in Label Studio, you will be asked for your access token.
This is an OAuth-based authentication using a service principal created in the Databricks Account Console. Works on all cloud platforms (AWS, GCP, Azure).
{workspace_host}/oidc/v1/tokenclient_id and generated secret.See Manage service principals and Authorize service principal access to Databricks with OAuth.
When configuring storage in Label Studio, you will be asked for the following:
!!! note
For Service Principal authentication, Label Studio automatically acquires and refreshes OAuth access tokens (~1 hour lifetime). No manual token rotation needed.
OAuth-based authentication using an Entra app registration.
Azure Databricks only.
https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/tokenFor more information, see Authorize service principal access to Azure Databricks with OAuth.
When configuring storage in Label Studio, you will be asked for the following:
!!! note
For Service Principal authentication, Label Studio automatically acquires and refreshes OAuth access tokens (~1 hour lifetime). No manual token rotation needed.
From Label Studio, open your project and select Settings > Cloud Storage > Add Source Storage.
Select Databricks Files (UC Volumes) and click Next.
Complete the following fields and then click Test connection:
<div class="noheader rowheader">| Storage Title | Enter a name for the storage connection to appear in Label Studio. |
| Workspace Host | Enter your workspace URL, for example https://<workspace-identifier>.cloud.databricks.com |
| Authentication Method | Select an authentication method and then enter the required information. See Authentication options above. |
| Catalog | |
| Schema | |
| Volume | Specify your volume path (UC coordinates). You can find this from the Catalog Explorer in Databricks (see screenshot below). |
Complete the following fields and then click Load preview to ensure you are syncing the correct data:
<div class="noheader rowheader">| Bucket Prefix | Optionally, enter the directory name within the volume that you would like to use. For example, data-set-1 or data-set-1/subfolder-2. |
| Import Method | Select whether you want create a task for each file in your container or whether you would like to use a JSON/JSONL/Parquet file to define the data for each task. |
| File Name Filter | Specify a regular expression to filter bucket objects. Use .* to collect all objects. |
| Scan all sub-folders | Enable this option to perform a recursive scan across subfolders within your container. |
If everything looks correct, click Save & Sync to sync immediately, or click Save to save your settings and sync later.
!!! info Tip You can also use the API to sync import storage.
!!! note "URI schema" To reference Databricks files directly in task JSON (without using source storage), use Label Studio’s Databricks URI scheme:
`dbx://Volumes/<catalog>/<schema>/<volume>/<path>`
Example:
`{ "image": "dbx://Volumes/main/default/dataset/images/1.jpg" }`
!!! note "Troubleshooting"
- If your file preview returns zero files, verify the path under /Volumes/<catalog>/<schema>/<volume>/<prefix?> and your permissions.
- Ensure the Workspace Host has no trailing slash and matches your workspace domain.
- If previews work but media fails to load, confirm proxy mode is allowed for your organization in Label Studio (Organization > Usage & License > Features) and network egress allows Label Studio to reach Databricks.
!!! warning "Proxy and security" This connector streams data through the Label Studio backend with HTTP Range support. Databricks does not support presigned URLs, so this option is also not available in Label Studio.
Repeat the steps from the previous section but using Add Target Storage. Use the same workspace host, token, and volume path (UC coordinates).
For your Bucket Prefix, set an export folder to use (e.g., exports/${project_id}) and determine whether you want to allow files to be deleted from target storage.
When file deletion is enabled, if you delete an annotation in Label Studio (via UI or API), Label Studio will also delete the corresponding exported JSON file from your target storage for this storage connection.
Note that this only affects files that were exported by that target storage, not your source media or tasks. Your PAT or SP permissions must also allow deletion.
After adding, click Sync to export annotations as JSON files to your target volume.
!!! info Tip You can also use the API to sync export storage.
You can also use the API to programmatically create connections. See our API documentation..
</div>