contrib/time_series_expectations/README.md
Expectations for detecting trends, seasonality, outliers, etc. in time series data
Author: Abe Gong (abegong)
This package currently contains...
Warning This package is experimental, a work in progress.
This package currently contains 3 new Expectations. These are examples of the primary patterns that future time series Expectations will follow.
expect_batch_row_count_to_match_prophet_date_model
expect_column_max_to_match_prophet_date_model
expect_column_pair_values_to_match_prophet_date_model
expect_batch_row_count_to_match_prophet_date_model and expect_column_max_to_match_prophet_date_model are Batch-level Expectations: each Batch corresponds to a single timestamp-value pair in a time series. In typical usage, you would expect to validate a single Batch (and therefore a single timestamp-value) at a time. expect_column_pair_values_to_match_prophet_date_model is a row-level Expectation: a Batch will typically contain many timestamp-values, which can be evaluated together.
Backend support:
expect_column_pair_values_to_match_prophet_date_model is not yet implemented in SQL.expect_column_pair_values_to_match_prophet_date_model relies on a UDF that may be slow for large data sets.from time_series_expectations.generator import DailyTimeSeriesGenerator
generator = DailyTimeSeriesGenerator()
df = generator.generate_df()
df = generator.generate_df(
size=180,
noise=5.0,
start_date="2022-01-01",
weekday_dummy_params=[0.0, 1.0, 4.0, 3.0, 2.0, -3.0, -4.5],
)
# etc.
See the script that creates examples (assets/generate_test_time_series_data.py) and tests(link) for additional examples
prophet (e.g. statsmodels.tst.arima, pdarima, NeuralProphet)requirements and documentation for installationexpect_column_pair_values_to_match_prophet_date_model and other row-level metrics.altairAs all of those use cases are realized, we imagine the full class hierarchy for time series Expectations to evolve into this:
*BatchExpectation* (ABC)
*BatchAggregateStatisticExpectation* (ABC)
BatchAggregateStatisticTimeSeriesExpectation (ABC)
ExpectBatchAggregateStatisticToMatchProphetDateModel (ABC)
expect_batch_row_count_to_match_prophet_date_model (:white_check_mark:)
expect_batch_most_recent_update_to_match_prophet_date_model
ExpectBatchAggregateStatisticToMatchProphetTimestampModel (ABC)
expect_batch_row_count_to_match_prophet_timestamp_model
expect_batch_most_recent_update_to_match_prophet_timestamp_model
ExpectBatchAggregateStatisticToMatchArimaModel (ABC)
expect_batch_row_count_to_match_arima_model
expect_batch_most_recent_update_to_match_arima_model
... for other types of models
*ColumnAggregateExpectation* (ABC)
ColumnAggregateTimeSeriesExpectation (ABC, :white_check_mark:)
expect_column_max_to_match_prophet_date_model (:white_check_mark:)
expect_column_{property}_to_match_{model}_model
...
*ColumnPairMapExpectation* (ABC, :white_check_mark:)
ColumnPairTimeSeriesExpectation (ABC)
expect_column_pair_values_to_match_prophet_date_model (:white_check_mark:)
expect_column_pair_values_to_match_prophet_timestamp_model
expect_column_pair_values_to_match_arima_model
Formatting conventions for the nested hierarchy above:
About Abstract Base Classes:
The most important ABCs are BatchAggregateStatisticTimeSeriesExpectation, ColumnAggregateTimeSeriesExpectation, and ColumnPairTimeSeriesExpectation. They allow time series models to be applied to data in a variety of shapes and formats. Like most ABCs, these classes won't be executable themselves, but will hold shared logic to make it easier to create and maintain Expectations that follow certain patterns.
About ColumnAggregateTimeSeriesExpectations:
We expect these to be an n by k matrix of n metrics and k models. Once we've fully established the right patterns for these Expectations (and their Renderers and Profilers), we should be able to code-gen most or all of these Expectations, to quickly expand coverage.
metrics:
max
min
stdev
sum
percent_missingness
percent_matching_regex
...
models:
prophet_date
prophet_timestamp
arima
...