metadata-ingestion/docs/sources/vertexai/vertexai_post.md
Use the Important Capabilities table above as the source of truth for supported features and whether additional configuration is required.
The connector captures comprehensive lineage relationships including cross-platform lineage to external data sources:
Core Vertex AI Lineage:
Cross-Platform Lineage (external data sources):
The connector links Vertex AI resources to external datasets when referenced in job configurations or ML Metadata artifacts. Supported platforms:
gcs platformbigquery platforms3 platformabs platformsnowflake platformUse platform_instance_map to configure platform instances and environments for external platforms, ensuring URNs match those from native connectors for proper lineage connectivity.
The connector extracts lineage and metrics from CustomJob training jobs using the Vertex AI ML Metadata API. This enables:
These features are controlled by the following configuration options:
use_ml_metadata_for_lineage (default: true) — Extracts lineage from ML Metadata for CustomJob and other non-AutoML training jobsextract_execution_metrics (default: true) — Extracts hyperparameters and metrics from ML Metadata Executionsinclude_evaluations (default: true) — Ingests model evaluations and evaluation metricsFor CustomJob lineage to work, your training scripts must log artifacts to Vertex AI ML Metadata. This happens automatically when using the Vertex AI Experiments SDK, or you can log manually:
from google.cloud import aiplatform
aiplatform.init(project="your-project", location="us-central1")
dataset_artifact = aiplatform.Artifact.create(
schema_title="system.Dataset",
uri="gs://your-bucket/data/train.csv",
display_name="training-dataset",
)
with aiplatform.start_execution(
schema_title="system.ContainerExecution",
display_name=f"training-job-{job_name}",
) as execution:
execution.assign_input_artifacts([dataset_artifact])
# ... training logic ...
model_artifact = aiplatform.Artifact.create(
schema_title="system.Model",
uri=model_uri,
display_name="trained-model",
)
execution.assign_output_artifacts([model_artifact])
To ensure external datasets are linked with the correct platform instances and environments (so URNs match those from native connectors), configure platform_instance_map:
source:
type: vertexai
config:
project_id: my-project
platform_instance_map:
gcs:
platform_instance: prod-gcs
env: PROD
bigquery:
platform_instance: prod-bq
env: PROD
s3:
platform_instance: prod-s3
env: PROD
snowflake:
platform_instance: prod-snowflake
env: PROD
convert_urns_to_lowercase: true # Required - Snowflake defaults to lowercase URNs
abs:
platform_instance: prod-abs
env: PROD
Platform-specific notes:
convert_urns_to_lowercase: true to match the Snowflake connector's default behaviorconvert_urns_to_lowercase: falseModule behavior is constrained by source APIs, permissions, and metadata exposed by the platform. Refer to capability notes for unsupported or conditional features.
If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.