metadata-ingestion/docs/sources/looker/lookml_pre.md
The lookml module ingests metadata from Looker into DataHub. It is intended for production ingestion workflows and module-specific capabilities are documented below.
You have 3 options for controlling where your ingestion of LookML is run.
Read on to learn more about these options.
To ingest LookML metadata through the UI, you must set up a GitHub deploy key using the instructions in the section above. Once that is complete, you can follow the on-screen instructions to set up a LookML source using the Ingestion page. The following video shows you how to ingest LookML metadata through the UI and find the relevant information from your Looker account.
<div style={{ position: "relative", paddingBottom: "56.25%", height: 0 }}> <iframe src="https://www.loom.com/embed/c66dd625de7f48b39005e0eb9c345f5a" frameBorder={0} webkitallowfullscreen="" mozallowfullscreen="" allowFullScreen="" style={{ position: "absolute", top: 0, left: 0, width: "100%", height: "100%" }} /> </div>You can set up ingestion using a GitHub Action to push metadata whenever your main Looker GitHub repo changes. The following sample GitHub action file can be modified to emit LookML metadata whenever there is a change to your repository. This ensures that metadata is already fresh and up to date.
Drop this file into your .github/workflows directory inside your Looker GitHub repo.
You need to set up the following secrets in your GitHub repository to get this workflow to work:
name: lookml metadata upload
on:
# Note that this action only runs on pushes to your main branch. If you want to also
# run on pull requests, we'd recommend running datahub ingest with the `--dry-run` flag.
push:
branches:
- main
release:
types: [published, edited]
workflow_dispatch:
jobs:
lookml-metadata-upload:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Run LookML ingestion
run: |
pip install 'acryl-datahub[lookml,datahub-rest]'
cat << EOF > lookml_ingestion.yml
# LookML ingestion configuration.
# This is a full ingestion recipe, and supports all config options that the LookML source supports.
source:
type: "lookml"
config:
base_folder: ${{ github.workspace }}
parse_table_names_from_sql: true
git_info:
repo: ${{ github.repository }}
branch: ${{ github.ref }}
# Options
#connection_to_platform_map:
# connection-name:
# platform: platform-name (e.g. snowflake)
# default_db: default-db-name (e.g. DEMO_PIPELINE)
api:
client_id: ${LOOKER_CLIENT_ID}
client_secret: ${LOOKER_CLIENT_SECRET}
base_url: ${LOOKER_BASE_URL}
# Enable API-based lineage extraction (required for field splitting features)
use_api_for_view_lineage: true
# Optional: Large view handling configuration
# field_threshold_for_splitting: 100
# allow_partial_lineage_results: true
# enable_individual_field_fallback: true
# max_workers_for_parallel_processing: 10
sink:
type: datahub-rest
config:
server: ${DATAHUB_GMS_URL}
token: ${DATAHUB_GMS_TOKEN}
EOF
datahub ingest -c lookml_ingestion.yml
env:
DATAHUB_GMS_URL: ${{ secrets.DATAHUB_GMS_URL }}
DATAHUB_GMS_TOKEN: ${{ secrets.DATAHUB_GMS_TOKEN }}
LOOKER_BASE_URL: ${{ secrets.LOOKER_BASE_URL }}
LOOKER_CLIENT_ID: ${{ secrets.LOOKER_CLIENT_ID }}
LOOKER_CLIENT_SECRET: ${{ secrets.LOOKER_CLIENT_SECRET }}
If you want to ingest lookml using the datahub cli directly, read on for instructions and configuration details.
To use LookML ingestion through the UI, or automate github checkout through the cli, you must set up a GitHub deploy key for your Looker GitHub repository. Read this document for how to set up deploy keys for your Looker git repo.
Three steps:
Generate SSH key pair without passphrase (creates looker_datahub_deploy_key and looker_datahub_deploy_key.pub):
Add public key to Looker git repo as read-only deploy key (guide):
Save private key file contents for the GitHub Deploy Key field in UI-based ingestion
Connection mapping enables accurate lineage to upstream warehouses by mapping Looker connection names to platforms and databases.
Two configuration options:
connection_to_platform_map and project_name fields (see starter recipe)Create a client ID and secret following Looker authentication docs. Ensure the API key has Admin privileges.
Without admin API credentials, manually populate connection_to_platform_map and project_name in your recipe.