metadata-integration/java/datahub-schematron/README.md
⚠️ This is an incubating project in draft status. APIs and functionality may change significantly between releases.
SchemaTron is a schema translation toolkit that converts between various schema formats and DataHub's native schema representation. It currently provides robust support for Apache Avro schema translation with a focus on complex schema structures including unions, arrays, maps, and nested records.
Command-line interface for converting schemas and emitting them to DataHub.
# Execute from this directory
../../../gradlew :metadata-integration:java:datahub-schematron:cli:run --args="-i cli/src/test/resources/FlatUser.avsc"
-i, --input: Input schema file or directory path-p, --platform: Data platform name (default: "avro")-s, --server: DataHub server URL (default: "http://localhost:8080")-t, --token: DataHub access token--sink: Output sink - "rest" or "file" (default: "rest")--output-file: Output file path when using file sink (default: "metadata.json")Core translation logic and models for schema conversion. Features include:
Support for complex Avro schema structures:
Comprehensive path handling for schema fields
DataHub-compatible metadata generation
Schema fingerprinting and versioning
The library can handle sophisticated schema structures including:
The project includes extensive test coverage through:
Test resources include example schemas demonstrating various Avro schema features and edge cases.
As this is an incubating project, we welcome contributions and feedback on: