docs/api/tutorials/sdk/search_client.md
DataHub's Python SDK makes it easy to search and discover metadata across your data ecosystem. Whether you're exploring unknown datasets, filtering by environment, or building advanced search tools, this guide walks you through how to do it all programmatically.
With the Search SDK, you can:
AND / OR / NOT logic for advanced queriesTo use DataHub SDK, you'll need to install acryl-datahub and set up a connection to your DataHub instance. Follow the installation guide to get started.
Connect to your DataHub instance:
from datahub.sdk import DataHubClient
client = DataHubClient(server="<your_server>", token="<your_token>")
http://localhost:8080https://<your_datahub_url>/gmsDataHub offers two primary search approaches:
:::note Combining Query and Filters
Query and filters can be used together for more precise searches. Check out this example for more details.
:::
Query-based search allows you to search using simple keywords. This matches across common fields like name, description, and column names. This is useful for exploration when you're unsure of the exact asset you're looking for.
For example, the script below searches for any assets that have sales in their metadata.
{{ inline /metadata-ingestion/examples/library/search_with_query.py show_path_as_comment }}
Example output:
[
DatasetUrn("urn:li:dataset:(urn:li:dataPlatform:snowflake,sales_revenue_2023,PROD)"),
DatasetUrn("urn:li:dataset:(urn:li:dataPlatform:snowflake,sales_forecast,PROD)")
]
Filter-based search allows you to scope results by platform, environment, entity type, and other structured fields. This is useful when you want to narrow down results to specific asset types or metadata fields.
For example, the script below searches for entities on the Snowflake platform.
{{ inline /metadata-ingestion/examples/library/search_with_filter.py show_path_as_comment }}
You can combine query and filters to refine search results further. For example, search for anything containing "forecast" that is either a chart or a Snowflake dataset.
{{ inline /metadata-ingestion/examples/library/search_with_query_and_filter.py show_path_as_comment }}
For more details on available filters, see the filter options.
Here are some common examples of advanced queries using filters and logical operations:
{{ inline /metadata-ingestion/examples/library/search_filter_by_entity_type.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/search_filter_by_platform.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/search_filter_by_env.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/search_filter_by_domain.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/search_filter_by_entity_subtype.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/search_filter_by_custom_property.py show_path_as_comment }}
You can combine filters using logical operations like and_, or_, and not_ to build advanced queries. Check the Logical Operator Options for more details.
{{ inline /metadata-ingestion/examples/library/search_filter_combined_operation.py show_path_as_comment }}
{{ inline /metadata-ingestion/examples/library/search_filter_not.py show_path_as_comment }}
Use F.custom_filter() to target specific fields such as urn, name, or description. Check the Supported Conditions for Custom Filter for the full list of allowed condition values.
{{ inline /metadata-ingestion/examples/library/search_filter_custom.py show_path_as_comment }}
:::note Searchable Fields
With F.custom_filter(), the fields annotated with @Searchable in the PDL file can be used for filtering. For example, you can filter datajob entities by fields like name, description, or env since they are annotated with @Searchable in the DataJobInfo.pdl.
:::
For a full reference, see the search SDK reference.
The following filter options are available in the SDK:
| Filter Type | Example Code |
|---|---|
| Platform | F.platform("snowflake") |
| Environment | F.env("PROD") |
| Entity Type | F.entity_type("dataset") |
| Domain | F.domain("urn:li:domain:xyz") |
| Subtype | F.entity_subtype("ML Experiment") |
| Deletion Status | F.soft_deleted("NOT_SOFT_DELETED") |
| Custom Property | F.has_custom_property("department", "sales") |
The following logical operators can be used to combine filters:
| Operator | Example Code | Description |
|---|---|---|
| AND | F.and_(...) | Return entities matching all specified conditions. |
| OR | F.or_(...) | Return entities matching at least one condition. |
| NOT | F.not_(...) | Exclude entities that match a given condition. |
Use F.custom_filter() to apply conditions on specific fields such as urn, name, or description.
| Condition | Description |
|---|---|
EQUAL | Exact match for string fields. |
CONTAIN | Contains substring in string fields. |
START_WITH | Begins with a specific substring. |
END_WITH | Ends with a specific substring. |
GREATER_THAN | For numeric or timestamp fields, checks if the value is greater than the specified value. |
LESS_THAN | For numeric or timestamp fields, checks if the value is less than the specified value. |
How do I handle authentication?
Generate a Personal Access Token from your DataHub instance settings and pass it into the DataHubClient. Check out the Personal Access Token Guide.
Can I combine query and filters?
Yes. Use query along with filter for more precise searches.