metadata-ingestion/src/datahub/ingestion/source/rdf/docs/README.md
RDF is a lightweight RDF ontology ingestion system for DataHub. This documentation provides comprehensive guides for understanding how RDF concepts are mapped to DataHub entities.
Complete technical specification - Precise mappings, algorithms, and implementation details:
Purpose: Precise technical specifications that ensure functionality isn't lost during refactoring.
Example RDF files can be found in the test fixtures directory: tests/unit/rdf/
Glossary Terms are identified by:
rdfs:label OR skos:prefLabel ≥3 chars)owl:Class, owl:NamedIndividual, skos:Concept, or custom class instancesowl:Ontology declarationsRDF glossaries are mapped to DataHub's glossary system through:
skos:ConceptScheme, skos:Collection)skos:broader), associative (skos:related), and external reference linksTerm Properties:
skos:prefLabel → rdfs:labelskos:definition → rdfs:commentRDF IRIs are transformed to DataHub URNs using:
/domain/subdomain/conceptskos:prefLabelskos:definitionskos:broader relationshipsRDF uses a fully modular, pluggable entity architecture:
dependencies in ENTITY_METADATAbuild_post_processing_mcps() hooksProcessing Flow:
processing_order first)See Entity Plugin Contract for details on adding new entity types.
Comprehensive business requirements document covering the background, motivation, problem statement, solution proposal, business justification, market opportunity, and success criteria for RDF. Essential reading for understanding the "why" behind RDF.
Complete guide for adding new entity types to rdf. Follow this contract to create pluggable entity modules that are automatically discovered and registered.
For questions about RDF:
examples/ directorysrc/rdf/--help for command options