docs/release_notes/upgrading_to_destinations_v2.md
import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import {SnowflakeMigrationGenerator, BigQueryMigrationGenerator, RedshiftMigrationGenerator, PostgresMigrationGenerator} from './destinations_v2.js'
Airbyte Destinations V2 provides you with:
_airbyte_meta: Airbyte will now populate typing errors in the _airbyte_meta column instead of failing your sync. You can query these results to audit misformatted or unexpected data.airbyte_internal schema: Airbyte will now generate all raw tables in the airbyte_internal schema. We no longer clutter your destination schema with raw data tables.To see more details and examples on the contents of the Destinations V2 release, see this guide. The remainder of this page will walk you through upgrading connectors from legacy normalization to Destinations V2.
Destinations V2 were in preview for Snowflake and BigQuery during August 2023, and launched on August 29th, 2023. Other destinations will be transitioned to Destinations V2 in early 2024.
The upgrade to Destinations V2 is handled by moving your connections to use updated versions of Airbyte destinations. Existing normalization options, both Raw data (JSON) and Normalized tabular data, will be unsupported starting the upgrade deadline.
As a Cloud user, existing connections using legacy normalization will be paused on the upgrade deadline. As an Open Source user, you may choose to upgrade at your convenience. However, destination connector versions prior to Destinations V2 will no longer be supported starting the upgrade deadline.
Note that Destinations V2 also removes the option to only replicate raw data. The vast majority of Airbyte users prefer typed final tables, and our future feature development will rely on this implementation. Learn more below.
The following table details the delivered data modified by Destinations V2:
| Current Normalization Setting | Source Type | Impacted Data (Breaking Changes) |
|---|---|---|
| Raw JSON | All | _airbyte metadata columns, raw table location |
| Normalized tabular data | API Source | Unnested tables, _airbyte metadata columns, SCD tables |
| Normalized tabular data | Tabular Source (database, file, etc.) | _airbyte metadata columns, SCD tables |
Whenever possible, we've taken this opportunity to use the best data type for storing JSON for your querying convenience. For example, destination-bigquery now loads JSON blobs as type JSON in BigQuery (introduced last year), instead of type string.
The quickest path to upgrading is to click upgrade on any out-of-date connection in the UI. The advanced options later in this document will allow you to test out the upgrade in more detail if you choose.
:::caution
[Airbyte Open Source Only] You should upgrade to 0.50.24+ of the Airbyte Platform before updating to Destinations V2. Failure to do so may cause upgraded connections to fail.
:::
:::caution Upgrade Warning
If any of the above concerns are applicable to your existing setup, we recommend Upgrading Connections One by One with Dual-Writing for a more controlled upgrade process :::
After upgrading the out-of-date destination to a Destinations V2 compatible version, the following will occur at the next sync for each connection sending data to the updated destination:
airbyte_internal schema.Due to the amount of operations to be completed, the first sync after upgrading to Destination V2 will be longer than normal. Once your first sync has completed successfully, you may need to make changes to downstream models (dbt, sql, etc.) transforming data. See this walkthrough of top changes to expect for more details.
Pre-existing raw tables, SCD tables and "unnested" tables will always be left untouched. You can delete these at your convenience, but these tables will no longer be kept up-to-date by Airbyte syncs. Each destination version is managed separately, so if you have multiple destinations, they all need to be upgraded one by one.
Versions are tied to the destination. When you update the destination, all connections tied to that destination will be sending data in the Destinations V2 format. For upgrade paths that will minimize disruption to existing dashboards, see:
Dual writing is a method employed during upgrades where new incoming data is written simultaneously to both the old and new systems, facilitating a smooth transition between versions. We recommend this approach for connections where you are especially worried about breaking changes or downtime in downstream systems.
Default Schema, which you should update to a new value to avoid collisions.These steps allow you to dual-write for connections incrementally syncing data without re-syncing historical data you've already replicated:
Settings tab. Open the Advanced settings to see the connection state (which manages incremental syncs). Copy the state to your clipboard.Update State. This will ensure historical data is not replicated again.You may want to verify the format of updated data for a single connection. To do this:
Manual replication frequency.When you are done testing, you can disable or delete this testing connection, and upgrade your pre-existing connections in place or upgrade one-by-one with dual writing.
If you have written downstream transformations directly from the output of raw tables, or use the "Raw JSON" normalization setting, you should know that:
airbyte_ab_id to airbyte_raw_id, and airbyte_emitted_at to airbyte_extracted_at).airbyte_internal schema in your destination.airbyte_internal schema. Although existing downstream dashboards will go stale, they will not be broken.We may make further changes to raw tables in the future, as these tables are intended to be a staging ground for Airbyte to optimize the performance of your syncs. We cannot guarantee the same level of stability as for final tables in your destination schema, nor will features like error handling be implemented in the raw tables.
As a user previously not running Normalization, Upgrading to Destinations V2 will increase the compute costs in your destination data warehouse. This is because Destinations V2 will now be performing the operations to generate a final typed table. Some destinations may provide an option to disable this - check your connectors's settings.
For each CDC-supported source connector, we recommend the following:
| CDC Source | Recommendation | Notes |
|---|---|---|
| Postgres | Upgrade connection in place | You can optionally dual write, but this requires resyncing historical data from the source. You must create a new Postgres source with a different replication slot than your existing source to preserve the integrity of your existing connection. |
| MySQL | All above upgrade paths supported | You can upgrade the connection in place, or dual write. When dual writing, Airbyte can leverage the state of an existing, active connection to ensure historical data is not re-replicated from MySQL. |
For each destination connector, Destinations V2 is effective as of the following versions:
| Destination Connector | Safe Rollback Version | Destinations V2 Compatible | Upgrade Deadline |
|---|---|---|---|
| BigQuery | 1.10.2 | 2.0.6+ | November 7, 2023 |
| Snowflake | 2.1.7 | 3.1.0+ | November 7, 2023 |
| Redshift | 0.8.0 | 2.0.0+ | March 15, 2024 |
| Postgres | 0.6.3 | 2.0.0+ | May 31, 2024 |
| MySQL | 0.2.0 | [coming soon] 2.0.0+ | [coming soon] early 2024 |
Note that legacy normalization will be deprecated for ClickHouse, DuckDB, MSSQL, TiDB, and Oracle DB in early 2024. If you wish to add Destinations V2 capability to these destinations, please reference our implementation guide (coming soon).
If you upgrade to Destinations V2 and start encountering issues, as an Open Source user you can optionally roll back. If you are running an outdated Airbyte Platform version (prior to v0.50.24), this may occur more frequently by accidentally upgrading to Destinations V2. However:
To roll back, follow these steps:
If you are an Airbyte Cloud customer, and encounter errors while upgrading from a V1 to a V2 destination, please reach out to support. We do not always recommend doing a full clear of your entire connection, depending on the type of error.
In addition to the changes which apply for all destinations described above, there are some per-destination fixes and updates included in Destinations V2:
Previously, we had used TEXT, which made querying sub-properties more difficult. In certain cases, numbers within sub-properties with long decimal values will need to be converted to float representations due to a quirk of Bigquery. Learn more here.
Snowflake will implicitly uppercase column names if they are not quoted. Airbyte needs to quote the column names because a variety of sources have column/field names which contain special characters that require quoting in Snowflake. However, when you quote a column name in Snowflake, it also preserves lowercase naming. During the Snowflake V2 beta, most customers found this behavior unexpected and expected column selection to be case-insensitive for columns without special characters. As a result of this feedback, we decided to explicitly uppercase column names in the final tables, which does mean that columns which previous required quoting, now also require you to convert to the upper case version.
For example:
-- Snowflake will implicitly uppercase column names which are not quoted
-- These three queries are equivalent
SELECT my_column from my_table;
SELECT MY_COLUMN from MY_TABLE;
SELECT "MY_COLUMN" from MY_TABLE;
-- However, this query is different, and requires a lowercase column name
SELECT "my_column" from my_table;
-- Because we are explicitly upper-casing column names, column names containing special characters (like a space)
-- should now also be uppercase
-- Before v2
SELECT "my column" from my_table;
-- After v2
SELECT "MY COLUMN" from my_table;
Postgres will implicitly lower case column names with mixed case characters when using unquoted identifiers. Based on feedback, we chose to replace any special characters like spaces with underscores and use quoted identifiers to preserve mixed case column names.
This section is targeted towards analysts updating downstream models after you've successfully upgraded to Destinations V2.
See here for a breakdown of changes. Your models will often require updates for the following changes:
_airbyte_emitted_at_ and _airbyte_extracted_at are exactly the same, only the column name changed. You can replace all instances of _airbyte_emitted_at with _airbyte_extracted_at._airbyte_ab_id and _airbyte_raw_id are exactly the same, only the column name changed. You can replace all instances of _airbyte_ab_id with _airbyte_raw_id._airbyte_normalized_at is no longer in the final table. We now recommend using _airbyte_extracted_at instead.You'll get data type errors in downstream models where previously string columns are now JSON. In BigQuery, nested JSON values originating from API sources were previously delivered in type string. These are now delivered in type JSON.
Example: In dbt, you may now get errors with functions such as regexp_replace. You can attempt prepending these with json_extract_array(...) or to_json_string(...) where appropriate.
Unnested tables (e.g. public.users_address) do not get deleted during the migration, and are no longer updated. Your downstream models will not throw errors until you drop these tables. Until then, dashboards reliant on these tables will be stale.