- Contribute change normally through feature branch created from current head of master branch with open PR to origin remote master branch and keep feature branch
- Ensure that similar fix is not already available in newer release of feast. If it is, finish this flow and switch to updating Ki's internal version of feast (potentially recerting fix from step 1 afterwards)
- Decide if given change is specific to Ki's combination of environment and non-standard approach or is it more of universal feast improvement
- Leave ample comments in PR to inform decisions of next person doing feast upgrade from upstream (if this change should be discarded then, is it purely internal one, is it temporary fix etc.)
- If this change is deemed something worth contributing back: rebase feature branch using master branch of original feast repo a.k.a. upstream
git checkout {feature-branch}
git rebase upstream/master
- If upstream remote is not set for this repository on your local machine use:
git remote add upstream https://github.com/feast-dev/feast
- Ensure upstream remote is set up properly
git remote -v
will result in
origin https://github.com/Ki-Insurance/feast.git (fetch)
origin https://github.com/Ki-Insurance/feast.git (push)
upstream https://github.com/feast-dev/feast (fetch)
upstream https://github.com/feast-dev/feast (push)
- After resolving any conflicts in rebase, push your branch to upstream
git push upstream {feature-branch}
- Continue with normal contribution to feast process as described in feast readme, but include link to such PR in closed PR to internal origin remote Ki's master branch from step 1.
- Note version of feast release from last PR rebasing origin master with upstream; it's also available in section below
- If branch with newer release is available in upstream, start update. Currently format of these branches is as follows:
v0.{version}-branch
. Sometimes there is no new branch but just a tag on master branch: that's how 0.39 was released - Create new feature branch from origin master and merge newest upstream release branch or branch local branch created from release tag
git checkout -b {name-of-branch} tags/{release-tag}
. DO NOT REBASE: it heavily obscures history of our changes and makes it harder to properly revert and redo these changes which is likely occurence for bigger feast updates (expect many breaking changes) - Resolve conflicts and run lint from makefile. In most cases resolving these conflicts will require contacting authors of our internal fixes for context, but as general rule of thumb take newest version of feast and reapply Ki changes when possible/relevant. Any requirements in setup.py should default to newer version (most probably from upstream)
- Create PR to origin master with said update branch
- Use commit hash to test potential new version basic functionality in feature-store app/feature-store project. NOTE that to test anything you first need to create new ki-features (same feature-store repo) lib version and merge it so it can be used for local feature store tests. Feature store can be tested locally with
make build-base-local
available command and then adjusting docker files of all containers used in deployment to point to that local image. Local tests do not ensure that such release will work as historically a lot of issues could be seen only in dev (connection bleed, breaking changes with no proper registry migration approach etc.) - Merge to master and include in feature-store for more extensive tests on dev
- Considering small amount of Ki specific changes and potential to introduce hard to track or resolve issues during conflict resolution in merge, there is alternative approach.
- Copy over newest release branch or create local one from release tag. Reapply manually all the Ki specific changes on top of it like for example in this commit: https://github.com/Ki-Insurance/feast/pull/32/commits/d4ab29e4249bfc66119b505bc65a461e20ccee42
- Merge origin master into aforementioned release branch with Ki changes. Ensure that freshly prepared release version will have priority in resolving conflicts
git merge --strategy=ours origin/master
- Advice: Create aforementioned newest version branch and apply Ki changes f.e.
0.39-update
then checkout new branch from that one f.e.0.39-update-merge-test
, then merge master into the merge test branch. If everything resolved properly0.39-update
and0.39-update-merge-test
should have no diff but0.39-update-merge-test
will now have a history allowing it to be merged without conflict intoorigin/master
- Merge such prepared version into master. It can be tested on dev deployment of feature-store before merging in this repo
- Some considerations for testing and what needs to be done:
- feature store deployment to dev won't show any issues with communication with models as they are inherently pointing to UAT deployment so to properly check everything works, deploy feature store with new version of feast to UAT then check ki-automation @algoRelease set of tests
- most probably if there are any breaking or bigger changes, there will be models for which all or some tests fail. General approach is to use new version of ki-features lib in these model deployments as hashes of feast version in feature store deployment and ki feature lib used in models need to be the same. There is overall push for model deployments to use newr version of libraries that no longer require ki-features and in turn are not vulnerable to updates of this feast repo.
Feast version from upstream: 0.39 (created from release tag as there was no branch)
Ki changes applied on top of feast version:
- expiriation for tables (bytewax materialization specific fix - should be kept as long as bytewax materialziation is used)
- added handling for date types
- provide types for on demand features - more in #13 and https://ki-insurance.atlassian.net/browse/DUG-121 - as source code was changed extensively around this logic, it is more of reintroduction of fix in new place
- introduction of mode for on-demand features was assuming seamless update but for our specific case there was a problem: definition in registry is kept in protobuf with non-nullable field for mode so it never falled into defaulting cases; this change allows seamless update in our environments without need to manually interfere in or completely recreate registry; should be discarded on next update
- small change in how async refresh is started; considering how registry refresh is written, it's creating new sql engine (and in turn connection pool) with every refresh; previously it was not a problem because old threads with said engines and connection pools were cleaned right away; with new approach using @asynccontextmanager said previous threads with engines (and connection pools) were not reclaimed automatically leading to connections bleed up to the registry limit
- our own implementation of async feature retrieval used in python sdk form by feature connector service
- comments for reasoning in change
- fix for when ODF input values typing can't be inferred for internal format transformations
Feast (Feature Store) is an open source feature store for machine learning. Feast is the fastest path to manage existing infrastructure to productionize analytic data for model training and online inference.
Feast allows ML platform teams to:
- Make features consistently available for training and serving by managing an offline store (to process historical data for scale-out batch scoring or model training), a low-latency online store (to power real-time prediction), and a battle-tested feature server (to serve pre-computed features online).
- Avoid data leakage by generating point-in-time correct feature sets so data scientists can focus on feature engineering rather than debugging error-prone dataset joining logic. This ensure that future feature values do not leak to models during training.
- Decouple ML from data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval, ensuring models remain portable as you move from training models to serving models, from batch models to realtime models, and from one data infra system to another.
Please see our documentation for more information about the project.
The above architecture is the minimal Feast deployment. Want to run the full Feast on Snowflake/GCP/AWS? Click here.
pip install feast
feast init my_feature_repo
cd my_feature_repo/feature_repo
feast apply
feast ui
from feast import FeatureStore
import pandas as pd
from datetime import datetime
entity_df = pd.DataFrame.from_dict({
"driver_id": [1001, 1002, 1003, 1004],
"event_timestamp": [
datetime(2021, 4, 12, 10, 59, 42),
datetime(2021, 4, 12, 8, 12, 10),
datetime(2021, 4, 12, 16, 40, 26),
datetime(2021, 4, 12, 15, 1 , 12)
]
})
store = FeatureStore(repo_path=".")
training_df = store.get_historical_features(
entity_df=entity_df,
features = [
'driver_hourly_stats:conv_rate',
'driver_hourly_stats:acc_rate',
'driver_hourly_stats:avg_daily_trips'
],
).to_df()
print(training_df.head())
# Train model
# model = ml.fit(training_df)
event_timestamp driver_id conv_rate acc_rate avg_daily_trips
0 2021-04-12 08:12:10+00:00 1002 0.713465 0.597095 531
1 2021-04-12 10:59:42+00:00 1001 0.072752 0.044344 11
2 2021-04-12 15:01:12+00:00 1004 0.658182 0.079150 220
3 2021-04-12 16:40:26+00:00 1003 0.162092 0.309035 959
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME
Materializing feature view driver_hourly_stats from 2021-04-14 to 2021-04-15 done!
from pprint import pprint
from feast import FeatureStore
store = FeatureStore(repo_path=".")
feature_vector = store.get_online_features(
features=[
'driver_hourly_stats:conv_rate',
'driver_hourly_stats:acc_rate',
'driver_hourly_stats:avg_daily_trips'
],
entity_rows=[{"driver_id": 1001}]
).to_dict()
pprint(feature_vector)
# Make prediction
# model.predict(feature_vector)
{
"driver_id": [1001],
"driver_hourly_stats__conv_rate": [0.49274],
"driver_hourly_stats__acc_rate": [0.92743],
"driver_hourly_stats__avg_daily_trips": [72]
}
The list below contains the functionality that contributors are planning to develop for Feast.
-
We welcome contribution to all items in the roadmap!
-
Data Sources
-
Offline Stores
-
Online Stores
-
Feature Engineering
-
Streaming
-
Deployments
-
Feature Serving
-
Data Quality Management (See RFC)
- Data profiling and validation (Great Expectations)
-
Feature Discovery and Governance
- Python SDK for browsing feature registry
- CLI for browsing feature registry
- Model-centric feature tracking (feature services)
- Amundsen integration (see Feast extractor)
- DataHub integration (see DataHub Feast docs)
- Feast Web UI (Beta release. See docs)
Please refer to the official documentation at Documentation
Feast is a community project and is still under active development. Please have a look at our contributing and development guides if you want to contribute to the project:
- Contribution Process for Feast
- Development Guide for Feast
- Development Guide for the Main Feast Repository
Thanks goes to these incredible people: