docs/features/feature-guides/file-upload-download.md
import FeatureAvailability from '@site/src/components/FeatureAvailability';
DataHub's File Upload and Download capability enables you to enrich your asset and column documentation with supporting files like images, diagrams, and other resources, making your data catalog more informative and easier to understand.
<p align="center"> </p>File uploads transform how you document and communicate about your data assets by enabling you to:
Whether you're a data engineer documenting a complex data pipeline, a data analyst explaining column definitions with example screenshots, or a data steward providing reference materials, file uploads make your documentation more comprehensive and accessible.
Currently, images are displayed inline within your documentation. Files of other types can be uploaded and downloaded, with support for additional inline previews (like PDFs and text files) coming in future releases.
You have two convenient ways to add files to your documentation:
When you upload a file, DataHub automatically:
dataHubFile entity to track metadata about your file (type, size, original filename)Files are securely stored in S3, and access is controlled based on the user's permissions for the asset where the file was uploaded.
File upload functionality requires configuration to connect DataHub to your S3 storage.
Your DataHub instance must be deployed on AWS for the AWS role authentication to work seamlessly.
Set the following environment variables in your DataHub GMS service:
The name of your S3 bucket where uploaded files will be stored.
DATAHUB_BUCKET_NAME=your-datahub-files-bucket
The AWS role ARN configured with permissions to upload and download files from your S3 bucket.
DATAHUB_ROLE_ARN=arn:aws:iam::123456789012:role/DataHubFileAccessRole
Your AWS role must be properly configured with two key requirements:
The role needs appropriate permissions to interact with your S3 bucket. At minimum, this should include:
s3:PutObject - To upload filess3:GetObject - To download filess3:DeleteObject - For garbage collection of unused filesConfigure these permissions through an IAM policy attached to the role.
Update the trust relationship for your role to allow your DataHub GMS service to assume it. The trust policy should permit the AWS service or role that your DataHub GMS is running under to assume this role.
Without proper trust relationship configuration, DataHub will not be able to authenticate with AWS to access your S3 bucket.
To add files to your documentation:
Files are uploaded to S3 as soon as you add them to the editor, even before you save your documentation changes.
<p align="center"> </p>When users view documentation containing uploaded files:
When a user attempts to download a file, DataHub verifies that they have permission to view the asset or column where the file was originally uploaded. This ensures that file access respects your existing DataHub permission structure.
If a user doesn't have permission to view the associated asset, they won't be able to download the file, even if they have a direct link to it.
Each uploaded file is represented as a dataHubFile entity in DataHub, which stores:
This metadata enables future features like file search, usage tracking, and cleanup of orphaned files.
When using file uploads in your documentation:
If files fail to upload, check:
DATAHUB_BUCKET_NAME is set correctly and the bucket existsDATAHUB_ROLE_ARN is valid and points to an existing roles3:PutObject)If users cannot download files, verify:
s3:GetObject permissions on the bucketIf you see AWS authentication errors:
Now that you understand how to use file uploads in documentation:
File uploads are designed to make your DataHub documentation more comprehensive and user-friendly, helping your organization build a richer, more informative data catalog.