docs/integrations/destinations/azure-blob-storage.md
This destination writes data to Azure Blob Storage.
The Airbyte Azure Blob Storage destination allows you to sync data to Azure Blob Storage. Each stream is written to its own blob under the container,
as <stream_namespace>/<stream_name>/yyyy_mm_dd_<unix_epoch>_<part_number>.<file_extension>.
| Sync mode | Supported? |
|---|---|
| Full Refresh - Overwrite | Yes |
| Full Refresh - Append | Yes |
| Full Refresh - Overwrite + Deduped | No |
| Incremental Sync - Append | Yes |
| Incremental Sync - Append + Deduped | No |
| Parameter | Type | Notes |
|---|---|---|
| Azure Blob Storage Endpoint Domain Name | string | This is Azure Blob Storage endpoint domain name. Leave default value (or leave it empty if run container from command line) to use Microsoft native one. |
| Azure Blob Storage Container Name | string | A name of the Azure Blob Storage container. If not exists - will be created automatically. If leave empty, then will be created automatically airbytecontainer+timestamp. |
| Azure Blob Storage Account Name | string | The account's name of the Azure Blob Storage. |
| Azure Blob Storage Account Key | string | Azure Blob Storage account key. If this is set, the Shared Access Signature, Azure Tenant ID, Azure Client ID, and Azure Client Secret fields must not be set. Example: abcdefghijklmnopqrstuvwxyz/0123456789+ABCDEFGHIJKLMNOPQRSTUVWXYZ/0123456789%++sampleKey==. |
| Shared Access Signature | string | Azure Blob Storage shared access signature (SAS). If this is set, the Azure Blob Storage Account Key, Azure Tenant ID, Azure Client ID, and Azure Client Secret fields must not be set. Example: sv=2025-01-01&ss=b&srt=co&sp=abcdefghijk&se=2026-01-31T07:00:00Z&st=2025-01-31T20:30:29Z&spr=https&sig=YWJjZGVmZ2hpamthYmNkZWZnaGlqa2FiY2RlZmdoaWp%3D. |
| Azure Tenant ID | string | Azure Active Directory (Entra ID) tenant ID. Required for Entra ID authentication. If this is set, Azure Client ID and Azure Client Secret must also be set. Example: 12345678-1234-1234-1234-123456789012. |
| Azure Client ID | string | Azure Active Directory (Entra ID) client ID. Required for Entra ID authentication. If this is set, Azure Tenant ID and Azure Client Secret must also be set. Example: 87654321-4321-4321-4321-210987654321. |
| Azure Client Secret | string | Azure Active Directory (Entra ID) client secret. Required for Entra ID authentication. If this is set, Azure Tenant ID and Azure Client ID must also be set. |
| Azure Blob Storage Target Blob Size (MB) | integer | How large each blob should be, in megabytes. Example: 500. After a blob exceeds this size, the connector will start writing to a new blob, and increment the part number. |
| Format | object | Format specific configuration. See below for details. |
Like most other Airbyte destination connectors, the output contains your data, along with some metadata fields.
If you select the "root level flattening" option, your data will be promoted to additional columns; if you select "no flattening", your data
will be left as a JSON blob inside the _airbyte_data column.
For example, given the following JSON object from a source:
{
"user_id": 123,
"name": {
"first": "John",
"last": "Doe"
}
}
With no flattening, the output CSV is:
_airbyte_raw_id | _airbyte_extracted_at | _airbyte_generation_id | _airbyte_meta | _airbyte_data |
|---|---|---|---|---|
26d73cde-7eb1-4e1e-b7db-a4c03b4cf206 | 1622135805000 | 11 | {"changes":[], "sync_id": 10111 } | { "user_id": 123, name: { "first": "John", "last": "Doe" } } |
With root level flattening, the output CSV is:
_airbyte_raw_id | _airbyte_extracted_at | _airbyte_generation_id | _airbyte_meta | user_id | name.first | name.last |
|---|---|---|---|---|---|---|
26d73cde-7eb1-4e1e-b7db-a4c03b4cf206 | 1622135805000 | 11 | {"changes":[], "sync_id": 10111 } | 123 | John | Doe |
JSON Lines is a text format with one JSON per line. As with the CSV format, this connector will write your data along
with some metadata fields. You can enable "root level flattening" to promote your data to the root
of the JSON object, or use "no flattening" to leave your data inside the _airbyte_data object.
For example, given the following two JSON object from a source:
{
"user_id": 123,
"name": {
"first": "John",
"last": "Doe"
}
}
{
"user_id": 456,
"name": {
"first": "Jane",
"last": "Roe"
}
}
With no flattening, the output JSONL is:
{ "_airbyte_raw_id": "26d73cde-7eb1-4e1e-b7db-a4c03b4cf206", "_airbyte_extracted_at": "1622135805000", "_airbyte_generation_id": "11", "_airbyte_meta": { "changes": [], "sync_id": 10111 }, "_airbyte_data": { "user_id": 123, "name": { "first": "John", "last": "Doe" } } }
{ "_airbyte_raw_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_airbyte_extracted_at": "1631948170000", "_airbyte_generation_id": "12", "_airbyte_meta": { "changes": [], "sync_id": 10112 }, "_airbyte_data": { "user_id": 456, "name": { "first": "Jane", "last": "Roe" } } }
With root level flattening, the output JSONL is:
{ "_airbyte_raw_id": "26d73cde-7eb1-4e1e-b7db-a4c03b4cf206", "_airbyte_extracted_at": "1622135805000", "_airbyte_generation_id": "11", "_airbyte_meta": { "changes": [], "sync_id": 10111 }, "user_id": 123, "name": { "first": "John", "last": "Doe" } }
{ "_airbyte_raw_id": "0a61de1b-9cdd-4455-a739-93572c9a5f20", "_airbyte_extracted_at": "1631948170000", "_airbyte_generation_id": "12", "_airbyte_meta": { "changes": [], "sync_id": 10112 }, "user_id": 456, "name": { "first": "Jane", "last": "Roe" } }
airbytecontainer with a timestamp suffix will be created.This destination supports namespaces. The namespace is used as part of the output path structure.
| Version | Date | Pull Request | Subject |
|---|---|---|---|
| 1.1.6 | 2026-01-26 | 72355 | Fix sync failures for sources with empty schemas by upgrading CDK to 0.2.1 |
| 1.1.5 | 2026-01-20 | 72301 | Upgrade CDK to 0.2.0 |
| 1.1.4 | 2025-11-05 | 69127 | Upgrade to Bulk CDK 0.1.61. |
| 1.1.3 | 2025-10-21 | 67153 | Implement new proto schema implementation |
| 1.1.2 | 2025-10-06 | 67078 | Remove memory limit for sync jobs to improve performance and resource utilization. |
| 1.1.1 | 2025-09-10 | 66139 | Fix inconsistent field name casing and improve tooltip clarity. Field names now use consistent title casing and tooltips reference exact field names. |
| 1.1.0 | 2025-09-05 | 65933 | Add support for Azure Entra ID (Service Principal) authentication. You can now authenticate using Azure AD tenant ID, client ID, and client secret. |
| 1.0.4 | 2025-08-07 | 64556 | Promoting release candidate 1.0.4-rc.1 to a main version. |
| 1.0.4-rc.1 | 2025-08-05 | 59710 | Release Azure blob destination on latest CDK |
| 1.0.3 | 2025-05-07 | 59710 | CDK backpressure bugfix |
| 1.0.2 | 2025-04-14 | 57563 | Fix signature spec example |
| 1.0.1 | 2025-04-09 | 57541 | Fix metadata to actually certify. |
| 1.0.0 | 2025-04-03 | 56391 | Bring into compliance with modern connector standards; certify connector. |
| 0.2.5 | 2025-03-21 | 55906 | Upgrade to airbyte/java-connector-base:2.0.1 to be M4 compatible. |
| 0.2.4 | 2025-01-10 | 51507 | Use a non root base image |
| 0.2.3 | 2024-12-18 | 49910 | Use a base image: airbyte/java-connector-base:1.0.0 |
| 0.2.2 | 2024-06-12 | #38061 | File Extensions added for the output files |
| 0.2.1 | 2023-09-13 | #30412 | Switch noisy logging to debug |
| 0.2.0 | 2023-01-18 | #21467 | Support spilling of objects exceeding configured size threshold |
| 0.1.6 | 2022-08-08 | #15318 | Support per-stream state |
| 0.1.5 | 2022-06-16 | #13852 | Updated stacktrace format for any trace message errors |
| 0.1.4 | 2022-05-17 | 12820 | Improved 'check' operation performance |
| 0.1.3 | 2022-02-14 | 10256 | Add -XX:+ExitOnOutOfMemoryError JVM option |
| 0.1.2 | 2022-01-20 | #9682 | Each data synchronization for each stream is written to a new blob to the folder with stream name. |
| 0.1.1 | 2021-12-29 | #9190 | Added BufferedOutputStream wrapper to blob output stream to improve performance and fix issues with 50,000 block limit. Also disabled autoflush on PrintWriter. |
| 0.1.0 | 2021-08-30 | #5332 | Initial release with JSONL and CSV output. |