docs/integrations/sources/gcs.md
This page contains the setup guide and reference information for the Google Cloud Storage (GCS) source connector.
</HideInUI>:::info Cloud storage may incur egress costs. Egress refers to data that is transferred out of the cloud storage system, such as when you download files or access them from a different location. For more information, see the Google Cloud Storage pricing guide. :::
First, you need to select existing or create a new project in the Google Cloud Console:
Create service account.service_account in the UI.Use the service account ID from above, grant read access to your target bucket. Click here for more details.
Service Account Information field .Service Account Information field .Bucket field.** as the pattern. For more precise pattern matching options, refer to the Path Patterns section below.{} and will automatically infer the schema from the file(s) you are replicating. For details on providing a custom schema, refer to the User Schema section.YYYY-MM-DDTHH:mm:ssZ. Leaving this field blank will replicate data from all files that have not been excluded by the Path Pattern and Path Prefix.Service Account Information field .Service Account Information field .Bucket field.** as the pattern. For more precise pattern matching options, refer to the Path Patterns section below.{} and will automatically infer the schema from the file(s) you are replicating. For details on providing a custom schema, refer to the User Schema section.YYYY-MM-DDTHH:mm:ssZ. Leaving this field blank will replicate data from all files that have not been excluded by the Path Pattern and Path Prefix.The Google Cloud Storage (GCS) source connector uses signed url to work with files when source authenticated with Service Account Information and gs://{blob.bucket.name}/{blob.name} when source authenticated via Google (OAuth).
This is important to know that File urls are used in the connection state.
So if you change authorization type, and you use Incremental sync the next sync will not use old state and reread provided files in Full Refresh mode(like initial sync), next syncs will be Incremental as expected.
(tl;dr -> path pattern syntax using wcmatch.glob. GLOBSTAR and SPLIT flags are enabled.)
This connector can sync multiple files by using glob-style patterns, rather than requiring a specific path for every file. This enables:
** would indicate every file in the folder.You must provide a path pattern. You can also provide many patterns split with | for more complex directory layouts.
Each path pattern is a reference from the root of the folder, so don't include the root folder name itself in the pattern(s).
Some example patterns:
** : match everything.**/*.csv : match all files with specific extension.myFolder/**/*.csv : match all csv files anywhere under myFolder.*/** : match everything at least one folder deep.*/*/*/** : match everything at least three folders deep.**/file.*|**/file : match every file called "file" with any extension (or no extension).x/*/y/* : match all files that sit in sub-folder x -> any folder -> folder y.**/prefix*.csv : match all csv files with specific prefix.**/prefix*.parquet : match all parquet files with specific prefix.Let's look at a specific example, matching the following folder layout (MyFolder is the folder specified in the connector config as the root folder, which the patterns are relative to):
MyFolder
-> log_files
-> some_table_files
-> part1.csv
-> part2.csv
-> images
-> more_table_files
-> part3.csv
-> extras
-> misc
-> another_part1.csv
We want to pick up part1.csv, part2.csv and part3.csv (excluding another_part1.csv for now). We could do this a few different ways:
**/part*.csv.some_table_files/*.csv|more_table_files/*.csv to pick up relevant files only from those exact folders.*table_files/*.csv. This could however cause problems in the future if new unexpected folders started being created.extras/**/*.csv would pick up any csv files nested in folders below "extras", such as "extras/misc/another_part1.csv".As you can probably tell, there are many ways to achieve the same goal with path patterns. We recommend using a pattern that ensures clarity and is robust against future additions to the directory structure.
When using the Avro, Jsonl, CSV or Parquet format, you can provide a schema to use for the output stream. Note that this doesn't apply to the experimental Document file type format.
Providing a schema allows for more control over the output of this stream. Without a provided schema, columns and datatypes will be inferred from the first created file in the bucket matching your path pattern and suffix. This will probably be fine in most cases but there may be situations you want to enforce a schema instead, e.g.:
_ab_additional_properties map._ab_additional_properties map.Or any other reason! The schema must be provided as valid JSON as a map of {"column": "datatype"} where each datatype is one of:
For example:
{"id": "integer", "location": "string", "longitude": "number", "latitude": "number"}{"username": "string", "friends": "array", "information": "object"}Since CSV files are effectively plain text, providing specific reader options is often required for correct parsing of the files. These settings are applied when a CSV is created or exported so please ensure that this process happens consistently over time.
User Provided assumes the CSV does not have a header row and uses the headers provided and Autogenerated assumes the CSV does not have a header row and the CDK will generate headers using for f{i} where i is the index starting from 0. Else, the default behavior is to use the header from the CSV file. If a user wants to autogenerate or provide column names for a CSV having headers, they can set a value for the "Skip rows before header" option to ignore the header row.\t. By default, this value is set to ,.utf8.\). For example, given the following data:Product,Description,Price
Jeans,"Navy Blue, Bootcut, 34\"",49.99
The backslash (\) is used directly before the second double quote (") to indicate that it is not the closing quote for the field, but rather a literal double quote character that should be included in the value (in this example, denoting the size of the jeans in inches: 34" ).
Leaving this field blank (default option) will disallow escaping.
".fast extracts text directly from the document which doesn't work for all files. ocr_only is more reliable, but slower. hi_res is the most reliable, but requires an API key and a hosted instance of unstructured and can't be used with local mode. See the unstructured.io documentation for more details.fast and ocr modes. This is the default option.hi_res mode. This option is useful for increased performance and accuracy, but requires an API key and a hosted instance of unstructured.The Google Cloud Storage (GCS) source connector supports the following sync modes:
| Feature | Supported?(Yes/No) | Notes |
|---|---|---|
| Full Refresh Sync | Yes | |
| Incremental Sync | Yes |
Google Cloud Storage (GCS) supports following file formats:
| Version | Date | Pull Request | Subject |
|---|---|---|---|
| 0.10.9 | 2026-03-19 | 74779 | Fix ZIP file detection for files with compound extensions (e.g. .csv.zip) |
| 0.10.8 | 2026-03-18 | 74781 | Fix records quadratic duplication |
| 0.10.7 | 2026-03-03 | 70287 | Update dependencies |
| 0.10.6 | 2026-02-13 | 73332 | Fix zip file extraction failing with DeliverRawFiles has no attribute delivery_type error |
| 0.10.5 | 2025-11-25 | 69913 | Update dependencies |
| 0.10.4 | 2025-11-18 | 69426 | Update dependencies |
| 0.10.3 | 2025-11-11 | 69270 | Update dependencies |
| 0.10.2 | 2025-11-04 | 69159 | Update dependencies |
| 0.10.1 | 2025-10-29 | 69054 | Update dependencies |
| 0.10.0 | 2025-10-27 | 68619 | Update dependencies |
| 0.9.2 | 2025-10-21 | 68330 | Update dependencies |
| 0.9.1 | 2025-10-14 | 68032 | Update dependencies |
| 0.9.0 | 2025-10-07 | 67340 | Promoting release candidate 0.9.0-rc.1 to a main version. |
| 0.9.0-rc.1 | 2025-10-06 | 66671 | Update to latest airbyte cdk |
| 0.8.31 | 2025-09-30 | 66303 | Update dependencies |
| 0.8.30 | 2025-09-09 | 66088 | Update dependencies |
| 0.8.29 | 2025-08-23 | 65389 | Update dependencies |
| 0.8.28 | 2025-08-16 | 64980 | Update dependencies |
| 0.8.27 | 2025-08-09 | 64627 | Update dependencies |
| 0.8.26 | 2025-08-02 | 64367 | Update dependencies |
| 0.8.25 | 2025-07-26 | 63951 | Update dependencies |
| 0.8.24 | 2025-07-19 | 63564 | Update dependencies |
| 0.8.23 | 2025-07-12 | 62985 | Update dependencies |
| 0.8.22 | 2025-07-05 | 62822 | Update dependencies |
| 0.8.21 | 2025-06-28 | 61274 | Update dependencies |
| 0.8.20 | 2025-05-27 | 60868 | Update dependencies |
| 0.8.19 | 2025-05-24 | 60392 | Update dependencies |
| 0.8.18 | 2025-05-10 | 60012 | Update dependencies |
| 0.8.17 | 2025-05-03 | 59443 | Update dependencies |
| 0.8.16 | 2025-04-26 | 58915 | Update dependencies |
| 0.8.15 | 2025-04-19 | 58312 | Update dependencies |
| 0.8.14 | 2025-04-12 | 57772 | Update dependencies |
| 0.8.13 | 2025-04-05 | 57213 | Update dependencies |
| 0.8.12 | 2025-03-29 | 56520 | Update dependencies |
| 0.8.11 | 2025-03-22 | 55956 | Update dependencies |
| 0.8.10 | 2025-03-08 | 55314 | Update dependencies |
| 0.8.9 | 2025-03-01 | 54973 | Update dependencies |
| 0.8.8 | 2025-02-25 | 54677 | Fix io.UnsupportedOperation: underlying stream is not seekable |
| 0.8.7 | 2025-02-22 | 54458 | Update dependencies |
| 0.8.6 | 2025-02-15 | 53712 | Update dependencies |
| 0.8.5 | 2025-02-08 | 53365 | Update dependencies |
| 0.8.4 | 2025-02-01 | 52379 | Update dependencies |
| 0.8.3 | 2025-01-18 | 49174 | Update dependencies |
| 0.8.2 | 2024-11-25 | 48647 | Starting with this version, the Docker image is now rootless. Please note that this and future versions will not be compatible with Airbyte versions earlier than 0.64 |
| 0.8.1 | 2024-10-28 | 45923 | Update logging |
| 0.8.0 | 2024-10-28 | 45414 | Add support for OAuth authentication |
| 0.7.4 | 2024-10-12 | 46858 | Update dependencies |
| 0.7.3 | 2024-10-05 | 46458 | Update dependencies |
| 0.7.2 | 2024-09-28 | 46178 | Update dependencies |
| 0.7.1 | 2024-09-24 | 45850 | Add integration tests |
| 0.7.0 | 2024-09-24 | 45671 | Add .zip files support |
| 0.6.9 | 2024-09-21 | 45798 | Update dependencies |
| 0.6.8 | 2024-09-19 | 45092 | Update CDK v5; Fix OSError not raised in stream_reader.open_file |
| 0.6.7 | 2024-09-14 | 45492 | Update dependencies |
| 0.6.6 | 2024-09-07 | 45232 | Update dependencies |
| 0.6.5 | 2024-08-31 | 45010 | Update dependencies |
| 0.6.4 | 2024-08-27 | 44796 | Fix empty list of globs when prefix empty |
| 0.6.3 | 2024-08-26 | 44781 | Set file signature URL expiration limit default to max |
| 0.6.2 | 2024-08-24 | 44733 | Update dependencies |
| 0.6.1 | 2024-08-17 | 44285 | Update dependencies |
| 0.6.0 | 2024-08-15 | 44015 | Add support for all FileBasedSpec file types |
| 0.5.0 | 2024-08-14 | 44070 | Update CDK v4 and Python 3.10 dependencies |
| 0.4.15 | 2024-08-12 | 43733 | Update dependencies |
| 0.4.14 | 2024-08-10 | 43512 | Update dependencies |
| 0.4.13 | 2024-08-03 | 43236 | Update dependencies |
| 0.4.12 | 2024-07-27 | 42693 | Update dependencies |
| 0.4.11 | 2024-07-20 | 42312 | Update dependencies |
| 0.4.10 | 2024-07-13 | 41865 | Update dependencies |
| 0.4.9 | 2024-07-10 | 41430 | Update dependencies |
| 0.4.8 | 2024-07-09 | 41148 | Update dependencies |
| 0.4.7 | 2024-07-06 | 41015 | Update dependencies |
| 0.4.6 | 2024-06-26 | 40540 | Update dependencies |
| 0.4.5 | 2024-06-25 | 40391 | Update dependencies |
| 0.4.4 | 2024-06-24 | 40234 | Update dependencies |
| 0.4.3 | 2024-06-22 | 40089 | Update dependencies |
| 0.4.2 | 2024-06-06 | 39255 | [autopull] Upgrade base image to v1.2.2 |
| 0.4.1 | 2024-05-29 | 38696 | Avoid error on empty stream when running discover |
| 0.4.0 | 2024-03-21 | 36373 | Add Gzip and Bzip compression support. Manage dependencies with Poetry. |
| 0.3.7 | 2024-02-06 | 34936 | Bump CDK version to avoid missing SyncMode errors |
| 0.3.6 | 2024-01-30 | 34681 | Unpin CDK version to make compatible with the Concurrent CDK |
| 0.3.5 | 2024-01-30 | 34661 | Pin CDK version until upgrade for compatibility with the Concurrent CDK |
| 0.3.4 | 2024-01-11 | 34158 | Fix issue in stream reader for document file type parser |
| 0.3.3 | 2023-12-06 | 33187 | Bump CDK version to hide source-defined primary key |
| 0.3.2 | 2023-11-16 | 32608 | Improve document file type parser |
| 0.3.1 | 2023-11-13 | 32357 | Improve spec schema |
| 0.3.0 | 2023-10-11 | 31212 | Migrated to file based CDK |
| 0.2.0 | 2023-06-26 | 27725 | License Update: Elv2 |
| 0.1.0 | 2023-02-16 | 23186 | New Source: GCS |