docs/en/integrations/authenticate_to_azure_storage.md
From v3.0 onwards, StarRocks can integrate with Microsoft Azure Storage (Azure Blob Storage or Azure Data Lake Storage) in the following scenarios:
StarRocks supports the following types of Azure Storage accounts:
In this topic, Hive catalog, file external table, and Broker Load are used as examples to show how StarRocks integrates with Azure Storage by using these types of Azure Storage accounts. For information about the parameters in the examples, see Hive catalog, File external table, and Broker Load.
StarRocks supports using one of the following authentication methods to access Blob Storage:
NOTE
When you load data or directly query files from Blob Storage, you must use the wasb or wasbs protocol to access your data:
- If your storage account allows access over HTTP, use the wasb protocol and write the file path as
wasb://<container>@<storage_account>.blob.core.windows.net/<path>/<file_name>.- If your storage account allows access over HTTPS, use the wasbs protocol and write the file path as
wasbs://<container>@<storage_account>.blob.core.windows.net/<path>/<file_name>.
Configure azure.blob.storage_account and azure.blob.shared_key as follows in the CREATE EXTERNAL CATALOG statement:
CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.blob.storage_account" = "<blob_storage_account_name>",
"azure.blob.shared_key" = "<blob_storage_account_shared_key>"
);
Configure azure.blob.storage_account, azure.blob.shared_key, and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:
CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "wasb[s]://<container>@<storage_account>.blob.core.windows.net/<path>/<file_name>",
"format" = "ORC",
"azure.blob.storage_account" = "<blob_storage_account_name>",
"azure.blob.shared_key" = "<blob_storage_account_shared_key>"
);
Configure azure.blob.storage_account, azure.blob.shared_key, and the file path (DATA INFILE) as follows in the LOAD LABEL statement:
LOAD LABEL test_db.label000
(
DATA INFILE("wasb[s]://<container>@<storage_account>.blob.core.windows.net/<path>/<file_name>")
INTO TABLE test_ingestion_2
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.blob.storage_account" = "<blob_storage_account_name>",
"azure.blob.shared_key" = "<blob_storage_account_shared_key>"
);
Configure azure.blob.storage_account, azure.blob.container, and azure.blob.sas_token as follows in the CREATE EXTERNAL CATALOG statement:
CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.blob.storage_account" = "<blob_storage_account_name>",
"azure.blob.container" = "<blob_container_name>",
"azure.blob.sas_token" = "<blob_storage_account_SAS_token>"
);
Configure azure.blob.storage_account, azure.blob.container, azure.blob.sas_token, and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:
CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "wasb[s]://<container>@<storage_account>.blob.core.windows.net/<path>/<file_name>",
"format" = "ORC",
"azure.blob.storage_account" = "<blob_storage_account_name>",
"azure.blob.container" = "<blob_container_name>",
"azure.blob.sas_token" = "<blob_storage_account_SAS_token>"
);
Configure azure.blob.storage_account, azure.blob.container, azure.blob.sas_token, and the file path (DATA INFILE) as follows in the LOAD LABEL statement:
LOAD LABEL test_db.label000
(
DATA INFILE("wasb[s]://<container>@<storage_account>.blob.core.windows.net/<path>/<file_name>")
INTO TABLE target_table
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.blob.storage_account" = "<blob_storage_account_name>",
"azure.blob.container" = "<blob_container_name>",
"azure.blob.sas_token" = "<blob_storage_account_SAS_token>"
);
StarRocks supports using one of the following authentication methods to access Data Lake Storage Gen1:
NOTE
When you load data or query files from Data Lake Storage Gen1, you must use the adl protocol to access your data and write the file path as
adl://<data_lake_storage_gen1_name>.azuredatalakestore.net/<path>/<file_name>.
Configure azure.adls1.use_managed_service_identity as follows in the CREATE EXTERNAL CATALOG statement:
CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.adls1.use_managed_service_identity" = "true"
);
Configure azure.adls1.use_managed_service_identity and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:
CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "adl://<data_lake_storage_gen1_name>.azuredatalakestore.net/<path>/<file_name>",
"format" = "ORC",
"azure.adls1.use_managed_service_identity" = "true"
);
Configure azure.adls1.use_managed_service_identity and the file path (DATA INFILE) as follows in the LOAD LABEL statement:
LOAD LABEL test_db.label000
(
DATA INFILE("adl://<data_lake_storage_gen1_name>.azuredatalakestore.net/<path>/<file_name>")
INTO TABLE target_table
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.adls1.use_managed_service_identity" = "true"
);
Configure azure.adls1.oauth2_client_id, azure.adls1.oauth2_credential, and azure.adls1.oauth2_endpoint as follows in the CREATE EXTERNAL CATALOG statement:
CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.adls1.oauth2_client_id" = "<application_client_id>",
"azure.adls1.oauth2_credential" = "<application_client_credential>",
"azure.adls1.oauth2_endpoint" = "<OAuth_2.0_authorization_endpoint_v2>"
);
Configure azure.adls1.oauth2_client_id, azure.adls1.oauth2_credential, azure.adls1.oauth2_endpoint, and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:
CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "adl://<data_lake_storage_gen1_name>.azuredatalakestore.net/<path>/<file_name>",
"format" = "ORC",
"azure.adls1.oauth2_client_id" = "<application_client_id>",
"azure.adls1.oauth2_credential" = "<application_client_credential>",
"azure.adls1.oauth2_endpoint" = "<OAuth_2.0_authorization_endpoint_v2>"
);
Configure azure.adls1.oauth2_client_id, azure.adls1.oauth2_credential, azure.adls1.oauth2_endpoint, and the file path (DATA INFILE) as follows in the LOAD LABEL statement:
LOAD LABEL test_db.label000
(
DATA INFILE("adl://<data_lake_storage_gen1_name>.azuredatalakestore.net/<path>/<file_name>")
INTO TABLE target_table
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.adls1.oauth2_client_id" = "<application_client_id>",
"azure.adls1.oauth2_credential" = "<application_client_credential>",
"azure.adls1.oauth2_endpoint" = "<OAuth_2.0_authorization_endpoint_v2>"
);
StarRocks supports using one of the following authentication methods to access Data Lake Storage Gen2:
NOTE
When you load data or query files from Data Lake Storage Gen2, you must use the abfs or abfss protocol to access your data:
- If your storage account allows access over HTTP, use the abfs protocol and write the file path as
abfs://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>.- If your storage account allows access over HTTPS, use the abfss protocol and write the file path as
abfss://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>.
Before you start, you need to make the following preparations:
Configure azure.adls2.oauth2_use_managed_identity, azure.adls2.oauth2_tenant_id, and azure.adls2.oauth2_client_id as follows in the CREATE EXTERNAL CATALOG statement:
CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.adls2.oauth2_use_managed_identity" = "true",
"azure.adls2.oauth2_tenant_id" = "<service_principal_tenant_id>",
"azure.adls2.oauth2_client_id" = "<service_client_id>"
);
Configure azure.adls2.oauth2_use_managed_identity, azure.adls2.oauth2_tenant_id, azure.adls2.oauth2_client_id, and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:
CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "abfs[s]://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>",
"format" = "ORC",
"azure.adls2.oauth2_use_managed_identity" = "true",
"azure.adls2.oauth2_tenant_id" = "<service_principal_tenant_id>",
"azure.adls2.oauth2_client_id" = "<service_client_id>"
);
Configure azure.adls2.oauth2_use_managed_identity, azure.adls2.oauth2_tenant_id, azure.adls2.oauth2_client_id, and the file path (DATA INFILE) as follows in the LOAD LABEL statement:
LOAD LABEL test_db.label000
(
DATA INFILE("adfs[s]://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>")
INTO TABLE target_table
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.adls2.oauth2_use_managed_identity" = "true",
"azure.adls2.oauth2_tenant_id" = "<service_principal_tenant_id>",
"azure.adls2.oauth2_client_id" = "<service_client_id>"
);
Configure azure.adls2.storage_account and azure.adls2.shared_key as follows in the CREATE EXTERNAL CATALOG statement:
CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.adls2.storage_account" = "<storage_account_name>",
"azure.adls2.shared_key" = "<shared_key>"
);
Configure azure.adls2.storage_account, azure.adls2.shared_key, and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:
CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "abfs[s]://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>",
"format" = "ORC",
"azure.adls2.storage_account" = "<storage_account_name>",
"azure.adls2.shared_key" = "<shared_key>"
);
Configure azure.adls2.storage_account, azure.adls2.shared_key, and the file path (DATA INFILE) as follows in the LOAD LABEL statement:
LOAD LABEL test_db.label000
(
DATA INFILE("adfs[s]://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>")
INTO TABLE target_table
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.adls2.storage_account" = "<storage_account_name>",
"azure.adls2.shared_key" = "<shared_key>"
);
Before you start, you need to create a service principal, create a role assignment to assign a role to the service principal, and then add the role assignment to your storage account. As such, you can make sure that this service principal can successfully access the data in your storage account.
Configure azure.adls2.oauth2_client_id, azure.adls2.oauth2_client_secret, and azure.adls2.oauth2_client_endpoint as follows in the CREATE EXTERNAL CATALOG statement:
CREATE EXTERNAL CATALOG hive_catalog_azure
PROPERTIES
(
"type" = "hive",
"hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083",
"azure.adls2.oauth2_client_id" = "<service_client_id>",
"azure.adls2.oauth2_client_secret" = "<service_principal_client_secret>",
"azure.adls2.oauth2_client_endpoint" = "<service_principal_client_endpoint>"
);
Configure azure.adls2.oauth2_client_id, azure.adls2.oauth2_client_secret, azure.adls2.oauth2_client_endpoint, and the file path (path) as follows in the CREATE EXTERNAL TABLE statement:
CREATE EXTERNAL TABLE external_table_azure
(
id varchar(65500),
attributes map<varchar(100), varchar(2000)>
)
ENGINE=FILE
PROPERTIES
(
"path" = "abfs[s]://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>",
"format" = "ORC",
"azure.adls2.oauth2_client_id" = "<service_client_id>",
"azure.adls2.oauth2_client_secret" = "<service_principal_client_secret>",
"azure.adls2.oauth2_client_endpoint" = "<service_principal_client_endpoint>"
);
Configure azure.adls2.oauth2_client_id, azure.adls2.oauth2_client_secret, azure.adls2.oauth2_client_endpoint, and the file path (DATA INFILE) as follows in the LOAD LABEL statement:
LOAD LABEL test_db.label000
(
DATA INFILE("adfs[s]://<container>@<storage_account>.dfs.core.windows.net/<path>/<file_name>")
INTO TABLE target_table
FORMAT AS "parquet"
)
WITH BROKER
(
"azure.adls2.oauth2_client_id" = "<service_client_id>",
"azure.adls2.oauth2_client_secret" = "<service_principal_client_secret>",
"azure.adls2.oauth2_client_endpoint" = "<service_principal_client_endpoint>"
);