docs/reference/codebase-structure.md
Let's examine the Feast codebase. This analysis is accurate as of Feast 0.23.
$ tree -L 1 -d
.
├── docs
├── examples
├── go
├── infra
├── java
├── protos
├── sdk
└── ui
The Python SDK lives in sdk/python/feast.
The majority of Feast logic lives in these Python files:
entity.py, feature_view.py, and data_source.py.FeatureStore class is defined in feature_store.py and the associated configuration object (the Python representation of the feature_store.yaml file) are defined in repo_config.py.cli.py and repo_operations.py.type_map.py.feast serve command) is defined in feature_server.py.There are also several important submodules:
infra/ contains all the infrastructure components, such as the provider, offline store, online store, batch materialization engine, and registry.dqm/ covers data quality monitoring, such as the dataset profiler.diff/ covers the logic for determining how to apply infrastructure changes upon feature repo changes (e.g. the output of feast plan and feast apply).embedded_go/ covers the Go feature server.ui/ contains the embedded Web UI, to be launched on the feast ui command.Of these submodules, infra/ is the most important.
It contains the interfaces for the provider, offline store, online store, compute engine, and registry, as well as all of their individual implementations.
$ tree --dirsfirst -L 1 infra
infra
├── contrib
├── feature_servers
├── materialization
├── offline_stores
├── online_stores
├── registry
├── transformation_servers
├── utils
├── __init__.py
├── aws.py
├── gcp.py
├── infra_object.py
├── key_encoding_utils.py
├── local.py
├── passthrough_provider.py
└── provider.py
The tests for the Python SDK are contained in sdk/python/tests.
For more details, see this overview of the test suite.
feast applyLet's walk through how feast apply works by tracking its execution across the codebase.
cli.py.
Most of these commands are backed by methods in repo_operations.py.
The feast apply command triggers apply_total_command, which then calls apply_total in repo_operations.py.FeatureStore object (from feature_store.py) that is initialized based on the feature_store.yaml in the current working directory, apply_total first parses the feature repo with parse_repo and then calls either FeatureStore.apply or FeatureStore._apply_diffs to apply those changes to the feature store.FeatureStore.apply.
It splits the objects based on class (e.g. Entity, FeatureView, etc.) and then calls the appropriate registry method to apply or delete the object.
For example, it might call self._registry.apply_entity to apply an entity.
If the default file-based registry is used, this logic can be found in infra/registry/registry.py.Provider.update_infra, which can be found in infra/provider.py.PassthroughProvider.update_infra in infra/passthrough_provider.py.update method from infra/online_stores/redis.py will be called.
And if the local materialization engine is configured then the update method from infra/materialization/local_engine.py will be called.At this point, the feast apply command is complete.
feast materializeLet's walk through how feast materialize works by tracking its execution across the codebase.
feast materialize command triggers materialize_command in cli.py, which then calls FeatureStore.materialize from feature_store.py.Provider.materialize_single_feature_view, which can be found in infra/provider.py.feast apply, the provider is most likely backed by the passthrough provider, in which case PassthroughProvider.materialize_single_feature_view will be called.LocalMaterializationEngine.materialize from infra/materialization/local_engine.py will be called.OfflineStore.pull_latest_from_table_or_query and OnlineStore.online_write_batch.
These two calls will be routed to the offline store and online store that have been configured.get_historical_featuresLet's walk through how get_historical_features works by tracking its execution across the codebase.
FeatureStore.get_historical_features in feature_store.py.
This method does some internal preparation, and then delegates the actual execution to the underlying provider by calling Provider.get_historical_features, which can be found in infra/provider.py.feast apply, the provider is most likely backed by the passthrough provider, in which case PassthroughProvider.get_historical_features will be called.OfflineStore.get_historical_features.
So if the feature store is configured to use Snowflake as the offline store, SnowflakeOfflineStore.get_historical_features will be executed.The java/ directory contains the Java serving component.
See here for more details on how the repo is structured.
The go/ directory contains the Go feature server.
Most of the files here have logic to help with reading features from the online store.
Within go/, the internal/feast/ directory contains most of the core logic:
onlineserving/ covers the core serving logic.model/ contains the implementations of the Feast objects (entity, feature view, etc.).
entity.go is the Go equivalent of entity.py. It contains a very simple Go implementation of the entity object.registry/ covers the registry.
onlinestore/ covers the online stores (currently only Redis and SQLite are supported).Feast uses protobuf to store serialized versions of the core Feast objects.
The protobuf definitions are stored in protos/feast.
The registry consists of the serialized representations of the Feast objects.
Typically, changes being made to the Feast objects require changes to their corresponding protobuf representations. The usual best practices for making changes to protobufs should be followed ensure backwards and forwards compatibility.
The ui/ directory contains the Web UI.
See here for more details on the structure of the Web UI.