docs/en/06-advanced/06-tdgpt/08-imputation.md
TDgpt provides data imputation based on time-series foundation models. It automatically detects missing time-series data points based on timestamps.
The timestamp intervals of the time-series data used for imputation must be strictly equal. In real-world environments, however, this is often not the case. To resolve this issue, perform the following:
Large difference in timestamp intervals: Perform window aggregation and then apply imputation on the aggregated results.
SELECT IMPUTATION(col_val)
FROM (SELECT _wstart, MIN(val) col_val FROM foo INTERVAL(1s));
Small difference in timestamp intervals: Use the TIMETRUNCATE function to truncate timestamps before performing imputation.
SELECT IMPUTATION(col_val)
FROM (SELECT TIMETRUNCATE(val) FROM foo);
IMPUTATION(column_name, option_expr)
option_expr: {
"algo=expr1
[,wncheck=1|0]
[,expr2]"
}
column_name: Input column for time-series imputation. Enter a single numeric column only. String types such as VARCHAR are not supported.options: A string specifying the anomaly detection algorithm and related parameters, formatted as comma-separated key=value pairs. Do not use quotes, escape characters, or non-ASCII characters. For example, algo=moment indicates using the moment time-series model with no additional parameters.SELECT clause behave the same as in other window queries.| Parameter | Description | Default |
|---|---|---|
| algo | Algorithm used for automatic imputation | moment |
| wncheck | Whether to perform white noise detection (0 or 1) | 1 |
| freq | Standard frequency of the input time-series data (interval between records) | seconds (s) |
moment time-series foundation model is supported for automatic imputation.moment model is properly deployed. Refer to the time-series foundation model deployment documentation.freq values: d (days), h (hours), m (minutes), s (seconds), ms (milliseconds), us (microseconds). The default is s (seconds). Setting an incorrect freq may affect time-series integrity detection and produce incorrect results.freq must not exceed the actual interval of the input time-series data; otherwise, incorrect results may occur.The pseudocolumn _improwts displays the corresponding timestamp for each imputed data point.
Use the moment model for automatic imputation on column i32, with frequency in seconds, outputting timestamps and values:
SELECT _improwts, imputation(i32, 'algo=moment, freq=s')
FROM foo;