infra/website/docs/blog/entity-less-historical-features-retrieval.md
For years, Historical Feature Retrieval in Feast required an entity dataframe; you had to supply the exact entity keys (e.g. driver_id, user_id) and timestamps you wanted to join features for. That works well when you have a fixed set of entities—for example, a list of users you want to score or a training set already keyed by IDs. But in many AI and ML projects, you don’t have entity IDs upfront, or the problem doesn’t naturally have entities at all. In those cases, being forced to create and pass an entity dataframe was a real friction.
We’re excited to share that Feast now supports entity-less historical feature retrieval based on a datetime range. You can pull all historical feature data for a time window without specifying any entity dataframe—addressing the long-standing GitHub issue #1611 and simplifying training and tuning workflows where entity IDs are optional or irrelevant.
Classic use of a feature store looks like this:
entity_df = pd.DataFrame({
"driver_id": [1001, 1002, 1003],
"event_timestamp": [datetime(2025, 1, 1), datetime(2025, 1, 2), datetime(2025, 1, 3)]
})
training_df = store.get_historical_features(
entity_df=entity_df,
features=["driver_hourly_stats:conv_rate", "driver_hourly_stats:acc_rate"],
).to_df()
You already have a set of entities and timestamps; Feast joins features onto them. But in many real-world setups:
In all these cases, passing entity IDs is either not possible, not required, or unnecessarily complex. Making the entity dataframe optional and supporting retrieval by datetime range makes the feature store much easier to use in production and in research.
Feast now supports entity-less historical feature retrieval by datetime range for several offline stores; you can pull historical feature data for a time window without specifying any entity dataframe. You specify a time window (and optionally rely on TTL for defaults), and the offline store returns all feature data in that range.
entity_df and use start_date and/or end_date instead.start_date, the range can be derived from the feature view TTL; if you don’t pass end_date, it defaults to “now”.Entity-less retrieval is supported across multiple offline stores: Postgres (where it was first introduced), Dask, Spark, and Ray—with Spark and Ray being especially important for large-scale and distributed training workloads. More offline stores will be supported in the future based on user demand and priority.
You can use any of these patterns depending on how much you want to specify.
1. Explicit date range (data between start and end):
training_df = store.get_historical_features(
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
],
start_date=datetime(2025, 7, 1, 1, 0, 0),
end_date=datetime(2025, 7, 2, 3, 30, 0),
).to_df()
2. Only end date (Start date is end date minus TTL):
training_df = store.get_historical_features(
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
],
end_date=datetime(2025, 7, 2, 3, 30, 0),
).to_df()
3. Only start date (data from start date to now):
training_df = store.get_historical_features(
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
],
start_date=datetime(2025, 7, 1, 1, 0, 0),
).to_df()
4. No dates (data from TTL window to now):
training_df = store.get_historical_features(
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"driver_hourly_stats:avg_daily_trips",
],
).to_df()
5. Entity-based retrieval still works (e.g. for ODFV or when you need data for specific entities):
entity_df = pd.DataFrame.from_dict({
"driver_id": [1005],
"event_timestamp": [datetime(2025, 6, 29, 23, 0, 0)],
})
training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"driver_hourly_stats:conv_rate",
"driver_hourly_stats:acc_rate",
"transformed_conv_rate:conv_rate_plus_val1",
],
).to_df()
Feast does not support mixing entity-based and range-based retrieval in one call; either pass entity_df or pass start_date/end_date, not both.
To experiment with entity-less retrieval:
get_historical_features() with only features and, as needed, start_date and end_date (or rely on TTL and default end time).We’re excited to see how the community uses entity-less historical retrieval. If you have feedback or want to help bring this to more offline stores, join the discussion on GitHub issue #1611 or Feast Slack.