metadata-ingestion/docs/sources/rdf/rdf_pre.md
The rdf module ingests RDF/OWL ontologies into DataHub as glossary terms, glossary nodes, and term relationships. It supports multiple RDF formats and dialects.
In order to ingest metadata from RDF files, you will need:
The source supports multiple RDF serialization formats:
The format is auto-detected from the file extension, or you can specify it explicitly using the format parameter.
The source parameter accepts multiple input types:
source: path/to/glossary.ttlsource: path/to/rdf_files/ (processes all RDF files, recursively if recursive: true)source: https://example.com/ontology.ttlsource: file1.ttl, file2.ttl, file3.ttlsource: path/to/**/*.ttlThe source supports different RDF dialects for specialized processing:
default - Standard RDF processing (BCBS239-style)fibo - FIBO (Financial Industry Business Ontology) dialectgeneric - Generic RDF processingThe dialect is auto-detected based on the RDF content, or you can force a specific dialect using the dialect parameter.
You can use SPARQL CONSTRUCT queries to filter the RDF graph before ingestion. This is useful for filtering by namespace, applying complex filtering logic, or reducing the size of large RDF graphs.
source:
type: rdf
config:
source: large_ontology.ttl
sparql_filter: |
CONSTRUCT { ?s ?p ?o }
WHERE {
?s ?p ?o .
FILTER(STRSTARTS(STR(?s), "https://example.org/module1/"))
}
Only CONSTRUCT queries are supported. The filter is applied before entity extraction.
You can control which entity types are ingested using export_only or skip_export:
source:
type: rdf
config:
source: glossary.ttl
export_only:
- glossary # Only ingest glossary terms
Available entity types: glossary (or glossary_terms), relationship (or relationships).