docs/source/contributor-guide/api-health.md
DataFusion is used extensively as a library in other applications and has a large public API. We try to keep the API well maintained and minimize breaking changes to avoid issues for downstream users.
An item is part of the public Rust API if it appears on the docs.rs page.
Breaking changes require users to modify their code for it to compile and run, and are listed as "Major Changes" in the SemVer Compatibility Section of the Cargo Book. Common examples include:
foo(a: i32, b: i32) -> foo(a: i32, b: i32, c: i32))pub functiontrait without a default implementationExamples of non-breaking changes include:
#[deprecated])trait with a default implementationDataFusion is also used as a SQL engine, so changes to SQL semantics (the results returned for a given query) are a form of breaking change. Even with no Rust API change, altering the behavior of an existing SQL construct can silently break downstream applications, dashboards, and tests.
We apply the same caution to SQL semantics changes as to Rust API changes: the benefit must be weighed against the cost of breaking downstream users.
When possible, we prefer to avoid making breaking API changes. One common way to avoid such changes is to deprecate the old API, as described in the Deprecation Guidelines section below.
If you do want to propose a breaking API change, we must weigh the benefits of the change with the cost (impact on downstream users). It is often frustrating for downstream users to change their applications, and it is even more so if they do not gain improved capabilities.
Examples of good reasons for a breaking API or SQL change:
Examples of potentially weak reasons:
When making breaking Rust API changes, please:
api-change label so the change is highlighted in the release notes.For breaking SQL changes, also describe the previous and new behavior in the PR description, ideally including example queries and results where appropriate. This makes review easier and helps downstream users discover the affected semantics.
When a change requires DataFusion users to modify their code as part of an upgrade, please consider documenting it in the version-specific Upgrade Guide.
When deprecating a method:
#[deprecated] and specify the exact DataFusion version in which it was deprecatedThe deprecated version is the next version that introduces the deprecation. For
example, if the current version listed in Cargo.toml is 43.0.0, then the next
version will be 44.0.0.
To mark the API as deprecated, use the #[deprecated(since = "...", note = "...")] attribute.
For example:
#[deprecated(since = "41.0.0", note = "Use new API instead")]
pub fn api_to_deprecated(a: usize, b: usize) {}
Deprecated methods will remain in the codebase for a period of 6 major versions or 6 months, whichever is longer, to provide users ample time to transition away from them.
Please refer to DataFusion releases to plan API migration ahead of time.