metadata-ingestion/examples/library/README.md
This directory contains examples demonstrating how to use the DataHub Python SDK and metadata emission APIs.
Each example is a standalone Python script that demonstrates a specific use case:
To ensure examples are maintainable and correct, follow this pattern when writing new examples:
Examples should have two main components:
from typing import Optional
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
def create_entity_metadata(...) -> MetadataChangeProposalWrapper:
"""
Create metadata for an entity.
This function is pure and testable - it doesn't have side effects.
Args:
... (all required parameters)
Returns:
MetadataChangeProposalWrapper containing the metadata
"""
# Build and return the MCP
return MetadataChangeProposalWrapper(...)
def main(emitter: Optional[DatahubRestEmitter] = None) -> None:
"""
Main function demonstrating the example use case.
Args:
emitter: Optional emitter for testing. If not provided, creates a new one.
"""
emitter = emitter or DatahubRestEmitter(gms_server="http://localhost:8080")
# Use the testable function
mcp = create_entity_metadata(...)
# Emit the metadata
emitter.emit(mcp)
print(f"Successfully created entity")
if __name__ == "__main__":
main()
When using the DataHub SDK (DataHubClient):
from typing import Optional
from datahub.sdk import DataHubClient
def perform_operation(client: DataHubClient, ...) -> ...:
"""
Perform an operation using the DataHub client.
Args:
client: DataHub client to use
...: Other parameters
Returns:
Result of the operation
"""
# Perform the operation
return result
def main(client: Optional[DataHubClient] = None) -> None:
"""
Main function demonstrating the example use case.
Args:
client: Optional client for testing. If not provided, creates one from env.
"""
client = client or DataHubClient.from_env()
result = perform_operation(client, ...)
print(f"Operation result: {result}")
if __name__ == "__main__":
main()
As standalone scripts:
python examples/library/notebook_create.py
In tests:
from examples.library.create_notebook import create_notebook_metadata
# Unit test
mcp = create_notebook_metadata(...)
assert mcp.entityUrn == "..."
# Integration test
from examples.library.create_notebook import main
main(emitter=test_emitter) # Inject test emitter
Examples are tested at two levels:
Located in tests/unit/test_library_examples.py:
Located in tests/integration/library_examples/:
# Run all example tests (unit only)
pytest tests/unit/test_library_examples.py
# Run specific unit tests
pytest tests/unit/test_library_examples.py::test_create_notebook_metadata
# Run integration tests (requires running DataHub)
pytest tests/integration/library_examples/ -m integration
# Run all tests
pytest tests/unit/test_library_examples.py tests/integration/library_examples/
notebook_create.py - Create a notebook entitydata_platform_create.py - Create a custom data platformglossary_term_create.py - Create glossary termsdataset_add_term.py - Add glossary terms to datasetsdataset_add_owner.py - Add ownership informationnotebook_add_tags.py - Add tags to notebooksdataset_query_deprecation.py - Check if a dataset is deprecatedsearch_with_query.py - Search for entitieslineage_column_get.py - Query column-level lineage