docs/src/content/docs/connectors/google_drive.mdx
The google_drive connector provides utilities for reading files from Google Drive using a service account.
from cocoindex.connectors import google_drive
:::note[Dependencies] This connector requires additional dependencies. Install with:
pip install cocoindex[google_drive]
:::
The connector provides two ways to read from Google Drive:
GoogleDriveSource — high-level source class with async iterationlist_files() — lower-level function returning a sync iteratorBoth require a Google service account with access to the target Drive folders.
:::note[Google Workspace CLI]
gws is an optional, unofficial Google Workspace CLI. It is actively developed and subject to change, but can be useful for exploring or validating Drive API access before configuring CocoIndex's service-account flow. For example:
gws auth setup
gws auth login
gws drive files list
In headless or agent workflows, gws can also read credentials from GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE. CocoIndex still expects the service account JSON path in service_account_credential_path; use the gws credentials setting for gws commands themselves.
:::
The primary source class for iterating over Google Drive files.
class GoogleDriveSource(
*,
service_account_credential_path: str,
root_folder_ids: Sequence[str],
mime_types: Sequence[str] | None = None,
)
Parameters:
service_account_credential_path — Path to the service account JSON credential file.root_folder_ids — List of Google Drive folder IDs to scan. Subfolders are traversed recursively.mime_types — Optional list of MIME types to include. If None, all file types are included.GoogleDriveSource provides async iteration via files(), yielding DriveFile objects (implementing the FileLike base class):
source = google_drive.GoogleDriveSource(
service_account_credential_path="./credentials.json",
root_folder_ids=["1abc...xyz"],
)
async for file in source.files():
text = await file.read_text()
...
items()items() yields (str, DriveFile) pairs, where the key is the file's name path. This is useful with mount_each():
async for key, file in source.items():
content = await file.read()
Use mime_types to restrict which files are returned:
source = google_drive.GoogleDriveSource(
service_account_credential_path="./credentials.json",
root_folder_ids=["1abc...xyz"],
mime_types=["application/pdf", "text/plain"],
)
Google Workspace files (Docs, Sheets, Slides) are automatically exported:
| Google Workspace type | Exported as |
|---|---|
| Google Docs | Plain text |
| Google Sheets | CSV |
| Google Slides | Plain text |
A lower-level sync iterator for listing files:
def list_files(spec: GoogleDriveSourceSpec) -> Iterator[DriveFile]
Parameters:
spec — A GoogleDriveSourceSpec with the same fields as GoogleDriveSource constructor parameters.Returns: A sync iterator of DriveFile objects.
DriveFile implements FileLike with Google Drive-specific behavior:
file_path — A DriveFilePath where resolve() returns the Google Drive file ID.read() / read_text() — Downloads file content via the Google Drive API. Partial reads (size parameter) are not supported.import cocoindex as coco
from cocoindex.connectors import google_drive
from cocoindex.resources.file import FileLike
@coco.fn(memo=True)
async def process_file(file: FileLike) -> None:
text = await file.read_text()
# ... process the file content ...
@coco.fn
async def app_main(credential_path: str, folder_ids: list[str]) -> None:
source = google_drive.GoogleDriveSource(
service_account_credential_path=credential_path,
root_folder_ids=folder_ids,
)
with coco.component_subpath("file"):
async for key, file in source.items():
await coco.mount(
coco.component_subpath(key),
process_file,
file,
)
app = coco.App(
"GoogleDriveIngestion",
app_main,
credential_path="./credentials.json",
folder_ids=["1abc...xyz"],
)