metadata-ingestion/docs/sources/openapi/openapi_post.md
The source uses a multi-step approach to extract schemas for API endpoints:
OpenAPI Specification (Primary) - The source first attempts to extract schemas directly from the OpenAPI specification file. This includes:
Example Data (Secondary) - If schemas aren't fully defined, the source looks for example data in the specification
Live API Calls (Optional) - If enable_api_calls_for_schema_extraction=True and credentials are provided, the source will make GET requests to endpoints when:
:::note API calls are only made for GET methods. POST, PUT, and PATCH methods rely solely on schema definitions in the OpenAPI specification. :::
:::tip Most schemas are extracted from the OpenAPI specification itself. API calls are primarily used as a fallback when the specification is incomplete. :::
When multiple HTTP methods are available for an endpoint, the source prioritizes extracting metadat from methods in this order:
The description, tags, and schema metadata all come from the same priority method to ensure consistency.
All ingested endpoints are organized in DataHub's browse interface using browse paths based on their endpoint path structure. This makes it easy to navigate and discover related endpoints.
For example:
/pet/findByStatus appears under the pet browse path/pet/{petId} appears under the pet browse path/store/order/{orderId} appears under store → orderEndpoints are grouped by their path segments, making it easy to find all endpoints related to a particular resource or feature.
source:
type: openapi
config:
name: my_api
url: https://api.example.com
swagger_file: openapi.json
bearer_token: "your-bearer-token-here"
source:
type: openapi
config:
name: my_api
url: https://api.example.com
swagger_file: openapi.json
token: "your-token-here"
source:
type: openapi
config:
name: my_api
url: https://api.example.com
swagger_file: openapi.json
username: your_username
password: your_password
The source can retrieve a token dynamically by making a request to a token endpoint. This is useful when tokens expire and need to be refreshed.
source:
type: openapi
config:
name: my_api
url: https://api.example.com
swagger_file: openapi.json
get_token:
request_type: get # or "post"
url_complement: api/auth/login?username={username}&password={password}
username: your_username
password: your_password
:::note
When using get_token with request_type: get, the username and password are sent in the URL query parameters, which is less secure. Use request_type: post when possible.
:::
For endpoints with path parameters where the source cannot automatically determine example values, you can provide them manually using forced_examples:
source:
type: openapi
config:
name: petstore_api
url: https://petstore.swagger.io
swagger_file: /v2/swagger.json
forced_examples:
/pet/{petId}: [1]
/store/order/{orderId}: [1]
/user/{username}: ["user1"]
The source will use these values to construct URLs for API calls when needed.
You can exclude specific endpoints from ingestion:
source:
type: openapi
config:
name: my_api
url: https://api.example.com
swagger_file: openapi.json
ignore_endpoints:
- /health
- /metrics
- /internal/debug
source:
type: openapi
config:
name: petstore_api
url: https://petstore.swagger.io
swagger_file: /v2/swagger.json
enable_api_calls_for_schema_extraction: false
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
source:
type: openapi
config:
name: petstore_api
url: https://petstore.swagger.io
swagger_file: /v2/swagger.json
bearer_token: "${BEARER_TOKEN}"
enable_api_calls_for_schema_extraction: true
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
source:
type: openapi
config:
name: petstore_api
url: https://petstore.swagger.io
swagger_file: /v2/swagger.json
# Authentication
bearer_token: "${BEARER_TOKEN}"
# Optional: Enable/disable API calls
enable_api_calls_for_schema_extraction: true
# Optional: Ignore specific endpoints
ignore_endpoints:
- /user/logout
# Optional: Provide example values for parameterized endpoints
forced_examples:
/pet/{petId}: [1]
/store/order/{orderId}: [1]
/user/{username}: ["user1"]
# Optional: Proxy configuration
proxies:
http: "http://proxy.example.com:8080"
https: "https://proxy.example.com:8080"
# Optional: SSL verification
verify_ssl: true
sink:
type: "datahub-rest"
config:
server: "http://localhost:8080"
If schemas aren't being extracted:
enable_api_calls_for_schema_extraction: true and provide credentialsenable_api_calls_for_schema_extraction=True, valid credentials must be provided.If ingestion fails, validate credentials, permissions, connectivity, and scope filters first. Then review ingestion logs for source-specific errors and adjust configuration accordingly.
If endpoints aren't appearing in DataHub:
If you see authentication errors: