contrib/time_series_expectations/docs/working-notes.md
Expectations for detecting trends, seasonality, outliers, etc. in time series data
Author: Abe Gong (abegong)
This is a rough draft of a future README for this package. Think of it as the contributor version of a design doc.
WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP
WIP WIP
WIP WARNING: this README is aspirational. Do not believe it yet. WIP
WIP WIP
WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP WIP
This package contains...
time_series_expectations has optional dependencies on several time series packages:
Each of these package has its own dependencies, strengths, and weaknesses. Please visit your favorite search engine and/or these external links for more information.
You can install any of those packages like this: pip install time_series_expectations[prophet, statsmodels]
Technically, pip install time_series_expectations is also allowed. But if you don't specify a time series dependency at all, very few of the Expectations in this package will work.
This module contains four new DataAssistants, which serve as the primary entry points for the package.
Warning: These docs assume that you have completed the Great Expectations onboarding tutorial, and are familiar with basic concepts including Expectations, DataAssistants, and DataSources.
freshness_expectation_suite = context.assistants.freshness.profile_data_asset(
context.data_sources.my_db.assets.my_table
)
volume_expectation_suite = context.assistants.volume.profile_data_asset(
context.data_sources.my_db.assets.my_table
)
time_series_expectation_suite = context.assistants.time_series_by_batch.profile_data_asset(
context.data_sources.my_db.assets.my_table,
columns=["foo", "bar", "baz"],
metrics=["mean", "median", "sum", "percent_null"]
)
time_series_expectation_suite = context.assistants.time_series_by_batch.profile_data_asset(
context.data_sources.my_db.assets.my_table,
columns=["foo", "bar", "baz"],
metric="percent_matching_regex",
kwargs={
"regex": "\d+",
}
)
time_series_expectation_suite = context.assistants.time_series_by_row.profile_data_asset( context.data_sources.my_db.assets.my_table, date_column="created_at", columns=["foo", "bar", "baz"], metrics=["mean", "median", "sum", "percent_null"] )
This module introduces several new Expectations, inheriting from some new abstract base classes.
You can learn more about the Expectations in the Expectation gallery, here.
The most important ABCs are BatchAggregateStatisticTimeSeriesExpectation, ColumnAggregateTimeSeriesExpectation, and ColumnPairTimeSeriesExpectation. They allow time series models to be applied to data in a variety of shapes and formats. Please see the class docstrings for more detailed explanation.
The full class hierarchy is:
*BatchExpectation* (ABC)
BatchAggregateStatisticExpectation (ABC)
ExpectBatchAggregateStatisticToBeBetween (ABC)
expect_batch_update_time_to_be_between
expect_batch_volume_to_be_between
BatchAggregateStatisticTimeSeriesExpectation (ABC)
ExpectBatchAggregateStatisticToMatchProphetDateModel (ABC)
expect_batch_volume_to_match_prophet_date_model
ExpectBatchAggregateStatisticToMatchProphetTimestampModel (ABC)
expect_batch_volume_to_match_prophet_timestamp_model
ExpectBatchAggregateStatisticToMatchArimaModel (ABC)
expect_batch_volume_to_match_arima_model
*ColumnAggregateExpectation* (ABC)
ColumnAggregateTimeSeriesExpectation (ABC)
expect_column_{property}_to_match_{model}_model
properties:
max
min
stdev
sum
percent_missingness
percent_matching_regex
...
models:
prophet_date
prophet_timestamp
arima
*ColumnPairMapExpectation* (ABC)
ColumnPairTimeSeriesExpectation (ABC)
expect_column_pair_values_to_match_prophet_date_model
expect_column_pair_values_to_match_prophet_timestamp_model
expect_column_pair_values_to_match_arima_model
Formatting conventions:
generate_time_series_df()
See the script that creates examples and the API docs for additional examples
Things like these datasets would be nifty:
* pypi downloads
* github stars
* stock market data
* https://steamcharts.com/
Related to "Add SQL implementation for expect_column_pair_values_to_match_prophet_date_model and other row-level metrics."
https://github.com/great-expectations/great_expectations/pull/3485/files