docs/cli-commands/search.md
The datahub search command provides a powerful interface to search across all DataHub entities directly from the command line. It supports both keyword search (default) and semantic search with flexible filtering, multiple output formats, and comprehensive discovery features.
The search command is a command group with multiple subcommands:
query (default) - Execute search queries across DataHub entitiesdiagnose - Diagnose search configuration and availability# Basic keyword search (query is the default subcommand)
datahub search "users"
datahub search query "users" # Explicit subcommand
# Search with filters
datahub search "*" --filter platform=snowflake --filter entity_type=dataset
# Semantic search
datahub search --semantic "financial reports"
# Get results in table format
datahub search "dashboard" --table
# Get just URNs for piping
datahub search "dataset" --urns-only
# Preview query without executing
datahub search "users" --filter platform=snowflake --dry-run
# Custom field projection (reduce response size)
datahub search "users" --projection "urn type platform { name }"
# Diagnose search configuration
datahub search diagnose
datahub search diagnose --format json
Keyword Search (Default): Traditional text-based search using DataHub's search index.
datahub search "users"
datahub search "customer_data"
Semantic Search: AI-powered search that understands meaning and context.
datahub search --semantic "financial reports"
datahub search --semantic "user analytics dashboards"
Simple Filters: Easy key=value syntax with implicit AND logic.
# Single filter
datahub search "*" --filter platform=snowflake
# Multiple filters (AND)
datahub search "*" --filter platform=snowflake --filter entity_type=dataset
# OR logic with comma-separated values
datahub search "*" --filter platform=snowflake,bigquery
Complex Filters: JSON-based filters with AND/OR/NOT logic.
# AND logic
datahub search "*" --filters '{"and": [{"platform": ["snowflake"]}, {"env": ["PROD"]}]}'
# OR logic
datahub search "*" --filters '{"or": [{"entity_type": ["chart"]}, {"entity_type": ["dashboard"]}]}'
# NOT logic
datahub search "*" --filters '{"not": {"platform": ["mysql"]}}'
# Complex nested logic
datahub search "*" --filters '{
"and": [
{"platform": ["snowflake", "bigquery"]},
{"env": ["PROD"]},
{"not": {"tag": ["urn:li:tag:Deprecated"]}}
]
}'
JSON (Default): Structured output with full metadata.
datahub search "users" --format json
# or simply
datahub search "users"
Table: Human-readable table format.
datahub search "users" --table
# or
datahub search "users" --format table
URNs: List of URNs only, perfect for piping.
datahub search "dataset" --urns-only
# or
datahub search "dataset" --format urns
List Available Filters: See all filter types and their descriptions.
datahub search --list-filters
Output includes:
Describe Specific Filter: Get detailed information about a filter.
datahub search --describe-filter platform
datahub search --describe-filter entity_type
datahub search --describe-filter custom
Facets-Only Mode: Get aggregation counts without search results.
datahub search "*" --facets-only
# With filters
datahub search "*" --filter entity_type=dataset --facets-only --format table
Useful for:
# Get first 20 results
datahub search "*" --limit 20
# Get next page (results 20-40)
datahub search "*" --limit 20 --offset 20
# Maximum limit is 50
datahub search "*" --limit 50
# Sort by field
datahub search "*" --sort-by lastModified
# Sort ascending
datahub search "*" --sort-by name --sort-order asc
# Sort descending (default)
datahub search "*" --sort-by name --sort-order desc
Apply saved DataHub Views to searches:
datahub search "*" --view "urn:li:dataHubView:12345678"
Preview the compiled GraphQL query and variables without connecting to DataHub:
# See what query would be sent
datahub search "users" --dry-run
# Dry run with filters and sorting
datahub search "*" --filter platform=snowflake --sort-by lastModified --dry-run
# Dry run with semantic search
datahub search --semantic "financial data" --dry-run
# Dry run with projection
datahub search "users" --projection "urn type" --dry-run
Output is a JSON object containing operation_name, graphql_field, variables, and optionally projection and query (when using --projection).
Control which GraphQL fields are returned per entity to reduce response size:
# Return only URN and type
datahub search "users" --projection "urn type"
# Return URN with platform info
datahub search "users" --projection "urn platform { ...PlatformFields }"
# Load projection from a file
datahub search "users" --projection @my_fields.gql
# Braces are optional — these are equivalent
datahub search "users" --projection "{ urn type }"
datahub search "users" --projection "urn type"
When --projection is omitted, the full default selection set from search_queries.gql is used. Projections have a 5000-character limit and must not contain mutations, introspection queries, variables, or directives.
Print best-practice guidance for AI agents consuming the search CLI:
# Print agent context and exit
datahub search --agent-context
When stdout is not a TTY (e.g., piped to an AI agent), --help automatically appends the agent context.
datahub search [OPTIONS] COMMAND [ARGS]
The search command is a command group. The default subcommand is query, so datahub search "text" is equivalent to datahub search query "text".
query (default)Execute search queries across DataHub entities.
datahub search [QUERY] [OPTIONS]
datahub search query [QUERY] [OPTIONS]
Options:
| Option | Type | Description |
|---|---|---|
QUERY | string | Search query string (default: "*" for all entities) |
--semantic | flag | Use semantic search instead of keyword search |
-f, --filter | string | Simple filter: key=value (repeatable, comma for OR) |
--filters | string | Complex filters as JSON string (AND/OR/NOT logic) |
-n, --limit | int | Number of results to return (default: 10, max: 50) |
--offset | int | Starting position for pagination (default: 0) |
--sort-by | string | Field name to sort by |
--sort-order | choice | Sort order: asc or desc (default: desc) |
--format | choice | Output format: json, table, or urns (default: json) |
--table | flag | Shortcut for --format table |
--urns-only | flag | Shortcut for --format urns |
--list-filters | flag | List all available filter fields |
--describe-filter | string | Describe a specific filter field |
--facets-only | flag | Return only facets, no search results |
--view | string | DataHub View URN to apply |
--dry-run | flag | Show compiled GraphQL query and variables without executing |
--projection | string | Custom GraphQL selection set for entity (inline or @file) |
--agent-context | flag | Print agent skill context (best practices for AI agents) |
diagnoseDiagnose search configuration and availability. Checks semantic search configuration, backend connectivity, embedding provider, model details, and enabled entity types.
datahub search diagnose [OPTIONS]
Options:
| Option | Type | Description |
|---|---|---|
--format | choice | Output format: text or json (default: text) |
Examples:
# Text output (default)
datahub search diagnose
# JSON output
datahub search diagnose --format json
# Search for everything
datahub search "*"
# Search for specific term
datahub search "users"
# Search with limit
datahub search "dashboard" --limit 20
# Find financial datasets
datahub search --semantic "financial reports and revenue data"
# Find user analytics
datahub search --semantic "user behavior analysis dashboards"
# Find ML models
datahub search --semantic "machine learning models for prediction"
# Platform filter
datahub search "*" --filter platform=snowflake
# Entity type filter
datahub search "*" --filter entity_type=dataset
# Multiple filters (AND)
datahub search "*" --filter platform=snowflake --filter env=PROD
# OR with comma separation
datahub search "*" --filter platform=snowflake,bigquery
# Combine different filters
datahub search "customer" --filter platform=mysql --filter entity_type=dataset,dashboard
# Find Snowflake PROD datasets
datahub search "*" --filters '{
"and": [
{"platform": ["snowflake"]},
{"entity_type": ["dataset"]},
{"env": ["PROD"]}
]
}'
# Find charts OR dashboards in Looker
datahub search "*" --filters '{
"and": [
{"platform": ["looker"]},
{"or": [
{"entity_type": ["chart"]},
{"entity_type": ["dashboard"]}
]}
]
}'
# Exclude deprecated datasets
datahub search "*" --filters '{
"and": [
{"entity_type": ["dataset"]},
{"not": {"tag": ["urn:li:tag:Deprecated"]}}
]
}'
# Custom field filter
datahub search "*" --filters '{
"custom": {
"field": "name",
"condition": "CONTAIN",
"values": ["customer"]
}
}'
# JSON (default) - full metadata
datahub search "users" --format json
# Table - human readable
datahub search "users" --table
# URNs only - for piping to other commands
datahub search "*" --filter entity_type=dataset --urns-only | while read urn; do
datahub get --urn "$urn" --aspect ownership
done
# List all available filters
datahub search --list-filters
# Learn about a specific filter
datahub search --describe-filter platform
datahub search --describe-filter entity_type
datahub search --describe-filter custom
# Explore data distribution with facets
datahub search "*" --facets-only --format table
datahub search "*" --filter entity_type=dataset --facets-only
# Diagnose search configuration
datahub search diagnose
datahub search diagnose --format json
# First page (10 results)
datahub search "dataset"
# Second page
datahub search "dataset" --limit 10 --offset 10
# Large page
datahub search "*" --limit 50 --offset 100
# Sort by last modified (newest first)
datahub search "*" --sort-by lastModified --sort-order desc
# Sort by name (alphabetically)
datahub search "*" --sort-by name --sort-order asc
# Quick version with simple filters
datahub search "*" --filter platform=snowflake,bigquery,postgres --filter entity_type=dataset --table
# Complex version with AND/OR
datahub search "*" --filters '{
"and": [
{"entity_type": ["dataset"]},
{"or": [
{"platform": ["snowflake"]},
{"platform": ["bigquery"]},
{"platform": ["postgres"]}
]}
]
}' --table
datahub search "*" --filters '{
"and": [
{"env": ["PROD"]},
{"tag": ["urn:li:tag:PII", "urn:li:tag:Sensitive"]}
]
}' --table
datahub search "*" --filters '{
"and": [
{"entity_type": ["dataset"]},
{"owner": ["urn:li:corpGroup:data-eng"]}
]
}' --table
# Find all PROD datasets and check their ownership
datahub search "*" \
--filter entity_type=dataset \
--filter env=PROD \
--urns-only | \
while read urn; do
echo "Checking $urn"
datahub get --urn "$urn" --aspect ownership
done
# Find datasets and export to file
datahub search "*" \
--filter platform=snowflake \
--format json > snowflake_datasets.json
# Find datasets about customers
datahub search --semantic "customer information and profiles"
# Find ML models for forecasting
datahub search --semantic "predictive models and forecasting algorithms"
# Find compliance-related documentation
datahub search --semantic "GDPR compliance and privacy documentation"
{
"count": 2,
"facets": [
{
"aggregations": [
{ "count": 10, "value": "DATASET" },
{ "count": 5, "value": "DASHBOARD" }
],
"displayName": "Type",
"field": "_entityType"
}
],
"searchResults": [
{
"entity": {
"platform": { "name": "snowflake" },
"properties": {
"description": "User data table",
"name": "users"
},
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,db.users,PROD)"
},
"matchedFields": [{ "name": "name", "value": "users" }]
}
],
"start": 0,
"total": 15
}
+--------------------------------------------------------------+------------+-----------+-------------------------+
| URN | Name | Platform | Description |
+==============================================================+============+===========+=========================+
| urn:li:dataset:(urn:li:dataPlatform:snowflake,db.users,PROD)| users | snowflake | User data table |
+--------------------------------------------------------------+------------+-----------+-------------------------+
| urn:li:dashboard:(looker,dashboards.19) | User Stats | looker | User analytics dash... |
+--------------------------------------------------------------+------------+-----------+-------------------------+
Showing 1-2 of 15 results
urn:li:dataset:(urn:li:dataPlatform:snowflake,db.users,PROD)
urn:li:dashboard:(looker,dashboards.19)
urn:li:chart:(looker,chart.123)
#!/bin/bash
# Find and tag all PROD datasets in Snowflake
DATASETS=$(datahub search "*" \
--filter platform=snowflake \
--filter entity_type=dataset \
--filter env=PROD \
--urns-only)
for urn in $DATASETS; do
echo "Tagging $urn"
datahub put --urn "$urn" --aspect globalTags -d '{
"tags": [{"tag": "urn:li:tag:Production"}]
}'
done
#!/bin/bash
# Find datasets without ownership
DATASETS=$(datahub search "*" \
--filter entity_type=dataset \
--limit 50 \
--format json)
echo "$DATASETS" | jq -r '.searchResults[] |
select(.entity.ownership == null) |
.entity.urn'
#!/bin/bash
# Monitor new datasets added today
COUNT=$(datahub search "*" \
--filter entity_type=dataset \
--format json | jq '.total')
echo "Total datasets: $COUNT"
The CLI provides clear error messages for common issues:
# Semantic search not enabled
datahub search --semantic "test"
# Error: Semantic search is not enabled on this DataHub instance.
# Use keyword search (default) or contact your DataHub administrator.
# Invalid filter format
datahub search "*" --filter invalid_format
# Error: Invalid filter: invalid_format. Expected key=value format.
# Conflicting filter options
datahub search "*" --filter platform=snowflake --filters '{"env": ["PROD"]}'
# Error: Cannot use both --filter and --filters.
# Use --filter for simple filters or --filters for complex logic.
# Invalid limit
datahub search "*" --limit 0
# Error: --limit must be at least 1
# Limit too high
datahub search "*" --limit 100
# Warning: limit capped at 50
The diagnose subcommand helps troubleshoot search configuration issues, particularly for semantic search:
# Check semantic search configuration
datahub search diagnose
# Get detailed JSON output
datahub search diagnose --format json
The diagnostic output includes:
datahub search "*" and add filters progressively--list-filters and --describe-filter to learn available options--table: Use table format for quick exploration before JSON processing--facets-only to understand data distribution before filtering--urns-only with other datahub commands for bulk operationstotal field in JSON output to determine if pagination is neededDATAHUB_GMS_URL and DATAHUB_GMS_TOKEN for easier access"Semantic search is not enabled": Your DataHub instance doesn't have semantic search configured. Use keyword search instead or contact your administrator.
"No results found": Try broadening your search query or removing filters. Use datahub search "*" to verify connectivity.
"Invalid filter": Check filter syntax. Use --list-filters to see available filters and --describe-filter <name> for details.
Slow searches: Use more specific filters to narrow results. Check network latency to DataHub server.
Permission denied: Ensure you've run datahub init and have valid credentials configured.
# Command help
datahub search --help
# Subcommand help
datahub search query --help
datahub search diagnose --help
# List available filters
datahub search --list-filters
# Learn about specific filter
datahub search --describe-filter platform
# Diagnose search configuration
datahub search diagnose
# Check DataHub connection
datahub check plugins