docs/integrations/sources/google-drive.md
This page contains the setup guide and reference information for the Google Drive source connector.
:::info The Google Drive source connector pulls data from a single folder in Google Drive. Subfolders are recursively included in the sync. All files in the specified folder and all sub folders will be considered. :::
The Google Drive source connector supports authentication via either OAuth or Service Account Key Authentication.
<!-- env:cloud -->For Airbyte Cloud users, we highly recommend using OAuth, as it significantly simplifies the setup process and allows you to authenticate directly from the Airbyte UI.
<!-- /env:cloud --> <!-- env:oss -->For Airbyte Open Source users, we recommend using Service Account Key Authentication. Follow the steps below to create a service account, generate a key, and enable the Google Drive API.
:::note
If you prefer to use OAuth for authentication with Airbyte Open Source, you can follow Google's OAuth instructions to create an authentication app. Be sure to set the scopes to https://www.googleapis.com/auth/drive.readonly. You will need to obtain your client ID, client secret, and refresh token for the connector setup.
:::
If your folder is viewable by anyone with its link, no further action is needed. If not, give your Service account access to your folder. Check out this video for how to do this.
<!-- /env:oss -->To set up Google Drive as a source in Airbyte Cloud:
(Recommended) Select Service Account Key Authentication from the dropdown and enter your Google Cloud service account key in JSON format:
{ "type": "service_account", "project_id": "YOUR_PROJECT_ID", "private_key_id": "YOUR_PRIVATE_KEY", ... }
To authenticate your Google account via OAuth, select Authenticate via Google (OAuth) from the dropdown and enter your Google application's client ID, client secret, and refresh token.
YYYY-MM-DDTHH:mm:ssZ. Leaving this field blank will replicate data from all files that have not been excluded by the Path Pattern and Path Prefix.The Google Drive source connector supports the following sync modes:
| Feature | Supported? |
|---|---|
| Full Refresh Sync | Yes |
| Incremental Sync | Yes |
| Replicate Incremental Deletes | No |
| Replicate Multiple Files (pattern matching) | Yes |
| Replicate Multiple Streams (distinct tables) | Yes |
| Namespaces | No |
(tl;dr -> path pattern syntax using wcmatch.glob. GLOBSTAR and SPLIT flags are enabled.)
This connector can sync multiple files by using glob-style patterns, rather than requiring a specific path for every file. This enables:
** would indicate every file in the folder.You must provide a path pattern. You can also provide many patterns split with | for more complex directory layouts.
:::tip
When your folder contains multiple file types, use glob patterns to select only the files that match your configured format. For example, if your folder contains both CSV files and PDFs, and you've configured the connector to parse CSV files, use a pattern like **/*.csv to ensure only CSV files are processed. Without this filtering, the connector will attempt to parse all matched files as the configured format, which can cause parsing errors for incompatible file types.
:::
Each path pattern is a reference from the root of the folder, so don't include the root folder name itself in the pattern(s).
Some example patterns:
** : match everything. (Warning: see the tip above regarding using this glob with folders containing multiple file types.)**/*.csv : match all files with specific extension.myFolder/**/*.csv : match all csv files anywhere under myFolder.*/** : match everything at least one folder deep. (Warning: see the tip above regarding using this glob with folders containing multiple file types.)*/*/*/** : match everything at least three folders deep. (Warning: see the tip above regarding using this glob with folders containing multiple file types.)**/file.*|**/file : match every file called "file" with any extension (or no extension).x/*/y/* : match all files that sit in sub-folder x -> any folder -> folder y. (Warning: see the tip above regarding using this glob with folders containing multiple file types.)**/prefix*.csv : match all csv files with specific prefix.**/prefix*.parquet : match all parquet files with specific prefix.Let's look at a specific example, matching the following folder layout (MyFolder is the folder specified in the connector config as the root folder, which the patterns are relative to):
MyFolder
-> log_files
-> some_table_files
-> part1.csv
-> part2.csv
-> images
-> more_table_files
-> part3.csv
-> extras
-> misc
-> another_part1.csv
We want to pick up part1.csv, part2.csv and part3.csv (excluding another_part1.csv for now). We could do this a few different ways:
**/part*.csv.some_table_files/*.csv|more_table_files/*.csv to pick up relevant files only from those exact folders.*table_files/*.csv. This could however cause problems in the future if new unexpected folders started being created.extras/**/*.csv would pick up any csv files nested in folders below "extras", such as "extras/misc/another_part1.csv".As you can probably tell, there are many ways to achieve the same goal with path patterns. We recommend using a pattern that ensures clarity and is robust against future additions to the directory structure.
When using the Avro, Jsonl, CSV or Parquet format, you can provide a schema to use for the output stream. Note that this doesn't apply to the experimental Document file type format.
Providing a schema allows for more control over the output of this stream. Without a provided schema, columns and datatypes will be inferred from the first created file in the bucket matching your path pattern and suffix. This will probably be fine in most cases but there may be situations you want to enforce a schema instead, e.g.:
_ab_additional_properties map._ab_additional_properties map.Or any other reason! The schema must be provided as valid JSON as a map of {"column": "datatype"} where each datatype is one of:
For example:
{"id": "integer", "location": "string", "longitude": "number", "latitude": "number"}{"username": "string", "friends": "array", "information": "object"}Since CSV files are effectively plain text, providing specific reader options is often required for correct parsing of the files. These settings are applied when a CSV is created or exported so please ensure that this process happens consistently over time.
User Provided assumes the CSV does not have a header row and uses the headers provided and Autogenerated assumes the CSV does not have a header row and the CDK will generate headers using for f{i} where i is the index starting from 0. Else, the default behavior is to use the header from the CSV file. If a user wants to autogenerate or provide column names for a CSV having headers, they can set a value for the "Skip rows before header" option to ignore the header row.\t. By default, this value is set to ,.utf8.\). For example, given the following data:Product,Description,Price
Jeans,"Navy Blue, Bootcut, 34\"",49.99
The backslash (\) is used directly before the second double quote (") to indicate that it is not the closing quote for the field, but rather a literal double quote character that should be included in the value (in this example, denoting the size of the jeans in inches: 34" ).
Leaving this field blank (default option) will disallow escaping.
".Apache Parquet is a column-oriented data storage format of the Apache Hadoop ecosystem. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. At the moment, partitioned parquet datasets are unsupported. The following settings are available:
The Avro parser uses the Fastavro library. The following settings are available:
There are currently no options for JSONL parsing.
:::warning The Document file type format is currently an experimental feature and not subject to SLAs. Use at your own risk. :::
The Document file type format is a special format that allows you to extract text from Markdown, TXT, PDF, Word, Powerpoint and Google documents. If selected, the connector will extract text from the documents and output it as a single field named content. The document_key field will hold a unique identifier for the processed file which can be used as a primary key. The content of the document will contain markdown formatting converted from the original file format. Each file matching the defined glob pattern needs to either be a markdown (md), PDF (pdf) or Docx (docx) file.
One record will be emitted for each document. Keep in mind that large files can emit large records that might not fit into every destination as each destination has different limitations for string fields.
Before parsing each document, the connector exports Google Document files to Docx format internally. Google Sheets, Google Slides, and drawings are internally exported and parsed by the connector as PDFs.
This connector utilizes the open source Unstructured library to perform OCR and text extraction from PDFs and MS Word files, as well as from embedded tables and images. You can read more about the parsing logic in the Unstructured docs and you can learn about other Unstructured tools and services at www.unstructured.io.
:::info
The raw file replication feature has the following requirements and limitations:
v1.2.0 or later1GB per filev1.4.0 or later:::
Copy raw files without parsing their contents. Bits are copied into the destination exactly as they appeared in the source. Recommended for use with unstructured text data, non-text and compressed files.
Format options will not be taken into account. Instead, files will be transferred to the file-based destination without parsing underlying data.
</FieldAnchor>If enabled, sends subdirectory folder structure along with source file names to the destination. Otherwise, files will be synced by their names only. This option is ignored when file-based replication is not enabled.
This mode allows to sync Google Drive files permissions (ACLs) and Identities (users and groups) from your Google Workspace. The Identities Stream is enabled by default.
To use these features, ensure you have the correct permissions and have enabled the required Google APIs.
Make sure the following APIs are enabled in your Google Cloud project:
When setting up this connector, ensure that the following scopes are authorized in the Google consent screen:
If you are syncing identities (users and groups) from a different domain than the one associated with your user account, you must specify the domain field in the connector configuration.
This stream syncs file permissions (Access Control Lists) for files in your Google Drive. You should set up a stream name and globs.
By default, this stream is enabled and retrieves information about users and groups in your Google Workspace. This helps you map file permissions (ACLs) to actual users and groups.
| Version | Date | Pull Request | Subject |
|---|---|---|---|
| 0.5.12 | 2026-03-17 | 74923 | Update dependencies |
| 0.5.11 | 2026-03-03 | 73119 | Update dependencies |
| 0.5.10 | 2026-02-03 | 72377 | Update dependencies |
| 0.5.9 | 2026-01-20 | 71890 | Update dependencies |
| 0.5.8 | 2026-01-14 | 71735 | Update dependencies |
| 0.5.7 | 2025-12-18 | 70532 | Update dependencies |
| 0.5.6 | 2025-12-02 | 70289 | Update dependencies |
| 0.5.5 | 2025-11-25 | 70090 | Update dependencies |
| 0.5.4 | 2025-11-18 | 69386 | Update dependencies |
| 0.5.3 | 2025-11-11 | 69272 | Update dependencies |
| 0.5.2 | 2025-11-04 | 69158 | Update dependencies |
| 0.5.1 | 2025-10-29 | 69053 | Update dependencies |
| 0.5.0 | 2025-10-27 | 68618 | Update dependencies |
| 0.4.10 | 2025-10-21 | 68321 | Update dependencies |
| 0.4.9 | 2025-10-14 | 68038 | Update dependencies |
| 0.4.8 | 2025-10-07 | 67259 | Update dependencies |
| 0.4.7 | 2025-09-30 | 66169 | Update dependencies |
| 0.4.6 | 2025-09-10 | 66009 | Update to CDK v7 |
| 0.4.5 | 2025-09-09 | 66108 | Update dependencies |
| 0.4.4 | 2025-08-23 | 61127 | Update dependencies |
| 0.4.3 | 2025-08-21 | 65081 | Certify connector |
| 0.4.2 | 2025-05-24 | 60621 | Update dependencies |
| 0.4.1 | 2025-05-10 | 58227 | Update dependencies |
| 0.4.0 | 2025-05-06 | 59690 | Promoting release candidate 0.4.0-rc.1 to a main version. |
| 0.4.0-rc.1 | 2025-04-30 | 57496 | Adapt file-transfer records to latest protocol, requires platform >= 1.7.0, destination-s3 >= 1.8.0 |
| 0.3.4 | 2025-04-12 | 57675 | Update dependencies |
| 0.3.3 | 2025-04-05 | 57072 | Update dependencies |
| 0.3.2 | 2025-03-29 | 56665 | Update dependencies |
| 0.3.1 | 2025-03-22 | 55938 | Update dependencies |
| 0.3.0 | 2025-03-11 | 55689 | Refactor to use new Stream Permissions Reader |
| 0.2.4 | 2025-03-08 | 55349 | Update dependencies |
| 0.2.3 | 2025-03-01 | 54955 | Update dependencies |
| 0.2.2 | 2025-02-22 | 54416 | Update dependencies |
| 0.2.1 | 2025-02-15 | 53774 | Update dependencies |
| 0.2.0 | 2025-02-14 | 52099 | Introduce ACLs and Permissions streams |
| 0.1.2 | 2025-02-08 | 53320 | Update dependencies |
| 0.1.1 | 2025-02-01 | 43895 | Update dependencies |
| 0.1.0 | 2025-01-27 | 52572 | Promoting release candidate 0.1.0-rc.1 to a main version. |
| 0.1.0-rc.1 | 2025-01-20 | 51585 | Bump cdk to enable universal file transfer |
| 0.0.12 | 2024-06-06 | 39291 | [autopull] Upgrade base image to v1.2.2 |
| 0.0.11 | 2024-05-29 | 38698 | Avoid error on empty stream when running discover |
| 0.0.10 | 2024-03-28 | 36581 | Manage dependencies with Poetry |
| 0.0.9 | 2024-02-06 | 34936 | Bump CDK version to avoid missing SyncMode errors |
| 0.0.8 | 2024-01-30 | 34681 | Unpin CDK version to make compatible with the Concurrent CDK |
| 0.0.7 | 2024-01-30 | 34661 | Pin CDK version until upgrade for compatibility with the Concurrent CDK |
| 0.0.6 | 2023-12-16 | 33414 | Prepare for airbyte-lib |
| 0.0.5 | 2023-12-14 | 33411 | Bump CDK version to auto-set primary key for document file streams and support raw txt files |
| 0.0.4 | 2023-12-06 | 33187 | Bump CDK version to hide source-defined primary key |
| 0.0.3 | 2023-11-16 | 31458 | Improve folder id input and update document file type parser |
| 0.0.2 | 2023-11-02 | 31458 | Allow syncs on shared drives |
| 0.0.1 | 2023-11-02 | 31458 | Initial Google Drive source |