scientific-skills/timesfm-forecasting/references/data_preparation.md
TimesFM accepts a list of 1-D numpy arrays. Each array represents one univariate time series.
inputs = [
np.array([1.0, 2.0, 3.0, 4.0, 5.0]), # Series 1
np.array([10.0, 20.0, 15.0, 25.0]), # Series 2 (different length)
np.array([100.0, 110.0, 105.0, 115.0, 120.0, 130.0]), # Series 3
]
np.float32 or np.float64import pandas as pd
import numpy as np
df = pd.read_csv("data.csv", parse_dates=["date"])
values = df["value"].values.astype(np.float32)
inputs = [values]
df = pd.read_csv("data.csv", parse_dates=["date"], index_col="date")
inputs = [df[col].dropna().values.astype(np.float32) for col in df.columns]
df = pd.read_csv("data.csv", parse_dates=["date"])
inputs = []
for series_id, group in df.groupby("series_id"):
values = group.sort_values("date")["value"].values.astype(np.float32)
inputs.append(values)
# Single column
inputs = [df["temperature"].values.astype(np.float32)]
# Multiple columns
inputs = [df[col].dropna().values.astype(np.float32) for col in numeric_cols]
# 2-D array (rows = series, cols = time steps)
data = np.load("timeseries.npy") # shape (N, T)
inputs = [data[i] for i in range(data.shape[0])]
# Or from 1-D
inputs = [np.sin(np.linspace(0, 10, 200))]
df = pd.read_excel("data.xlsx", sheet_name="Sheet1")
inputs = [df[col].dropna().values.astype(np.float32) for col in df.select_dtypes(include=[np.number]).columns]
df = pd.read_parquet("data.parquet")
inputs = [df[col].dropna().values.astype(np.float32) for col in df.select_dtypes(include=[np.number]).columns]
import json
with open("data.json") as f:
data = json.load(f)
# Assumes {"series_name": [values...], ...}
inputs = [np.array(values, dtype=np.float32) for values in data.values()]
TimesFM handles NaN values automatically:
Stripped before feeding to the model:
# Input: [NaN, NaN, 1.0, 2.0, 3.0]
# Actual: [1.0, 2.0, 3.0]
Linearly interpolated:
# Input: [1.0, NaN, 3.0, NaN, NaN, 6.0]
# Actual: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
Not handled — drop them before passing to the model:
values = df["value"].values.astype(np.float32)
# Remove trailing NaNs
while len(values) > 0 and np.isnan(values[-1]):
values = values[:-1]
inputs = [values]
def clean_series(arr: np.ndarray) -> np.ndarray:
"""Clean a time series for TimesFM input."""
arr = np.asarray(arr, dtype=np.float32)
# Remove trailing NaNs
while len(arr) > 0 and np.isnan(arr[-1]):
arr = arr[:-1]
# Replace inf with NaN (will be interpolated)
arr[np.isinf(arr)] = np.nan
return arr
inputs = [clean_series(df[col].values) for col in cols]
| Context Length | Use Case | Notes |
|---|---|---|
| 64–256 | Quick prototyping | Minimal context, fast |
| 256–512 | Daily data, ~1 year | Good balance |
| 512–1024 | Daily data, ~2-3 years | Standard production |
| 1024–4096 | Hourly data, weekly patterns | More context = better |
| 4096–16384 | High-frequency, long patterns | TimesFM 2.5 maximum |
Rule of thumb: Provide at least 3–5 full cycles of the dominant pattern (e.g., for weekly seasonality with daily data, provide at least 21–35 days).
TimesFM 2.5 supports exogenous variables through the forecast_with_covariates() API.
| Type | Description | Example |
|---|---|---|
| Dynamic numerical | Time-varying numeric features | Temperature, price, promotion spend |
| Dynamic categorical | Time-varying categorical features | Day of week, holiday flag |
| Static categorical | Fixed per-series features | Store ID, region, product category |
Each covariate must have length context + horizon for each series:
import numpy as np
context_len = 100 # length of historical data
horizon = 24 # forecast horizon
total_len = context_len + horizon
# Dynamic numerical: temperature forecast for each series
temp = [
np.random.randn(total_len).astype(np.float32), # Series 1
np.random.randn(total_len).astype(np.float32), # Series 2
]
# Dynamic categorical: day of week (0-6) for each series
dow = [
np.tile(np.arange(7), total_len // 7 + 1)[:total_len], # Series 1
np.tile(np.arange(7), total_len // 7 + 1)[:total_len], # Series 2
]
# Static categorical: one label per series
regions = ["east", "west"]
# Forecast with covariates
point, quantiles = model.forecast_with_covariates(
inputs=[values1, values2],
dynamic_numerical_covariates={"temperature": temp},
dynamic_categorical_covariates={"day_of_week": dow},
static_categorical_covariates={"region": regions},
xreg_mode="xreg + timesfm",
)
| Mode | Description |
|---|---|
"xreg + timesfm" | Covariates processed first, then combined with TimesFM forecast |
"timesfm + xreg" | TimesFM forecast first, then adjusted by covariates |
TimesFM needs at least 1 data point, but more context = better forecasts.
MIN_LENGTH = 32 # Practical minimum for meaningful forecasts
inputs = [
arr for arr in raw_inputs
if len(arr[~np.isnan(arr)]) >= MIN_LENGTH
]
Constant series may produce NaN or zero-width prediction intervals:
for i, arr in enumerate(inputs):
if np.std(arr[~np.isnan(arr)]) < 1e-10:
print(f"⚠️ Series {i} is constant — forecast will be flat")
Large outliers can destabilize forecasts even with normalization:
def clip_outliers(arr: np.ndarray, n_sigma: float = 5.0) -> np.ndarray:
"""Clip values beyond n_sigma standard deviations."""
mu = np.nanmean(arr)
sigma = np.nanstd(arr)
if sigma > 0:
arr = np.clip(arr, mu - n_sigma * sigma, mu + n_sigma * sigma)
return arr
TimesFM handles each series independently, so you can mix frequencies:
inputs = [
daily_sales, # 365 points
weekly_revenue, # 52 points
monthly_users, # 24 points
]
# All forecasted in one batch — TimesFM handles different lengths
point, q = model.forecast(horizon=12, inputs=inputs)
However, the horizon is shared. If you need different horizons per series,
forecast in separate calls.