Skip to content

Ki-Insurance/feast

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Internal Ki guidelines

Contributing flow

  1. Contribute change normally through feature branch created from current head of master branch with open PR to origin remote master branch and keep feature branch
  2. Ensure that similar fix is not already available in newer release of feast. If it is, finish this flow and switch to updating Ki's internal version of feast (potentially recerting fix from step 1 afterwards)
  3. Decide if given change is specific to Ki's combination of environment and non-standard approach or is it more of universal feast improvement
  4. Leave ample comments in PR to inform decisions of next person doing feast upgrade from upstream (if this change should be discarded then, is it purely internal one, is it temporary fix etc.)
  5. If this change is deemed something worth contributing back: rebase feature branch using master branch of original feast repo a.k.a. upstream
git checkout {feature-branch}
git rebase upstream/master
  1. If upstream remote is not set for this repository on your local machine use:
git remote add upstream https://github.com/feast-dev/feast
  1. Ensure upstream remote is set up properly git remote -v will result in
origin	https://github.com/Ki-Insurance/feast.git (fetch)
origin	https://github.com/Ki-Insurance/feast.git (push)
upstream	https://github.com/feast-dev/feast (fetch)
upstream	https://github.com/feast-dev/feast (push)
  1. After resolving any conflicts in rebase, push your branch to upstream
git push upstream {feature-branch}
  1. Continue with normal contribution to feast process as described in feast readme, but include link to such PR in closed PR to internal origin remote Ki's master branch from step 1.

Updating to newer version

  1. Note version of feast release from last PR rebasing origin master with upstream; it's also available in section below
  2. If branch with newer release is available in upstream, start update. Currently format of these branches is as follows: v0.{version}-branch. Sometimes there is no new branch but just a tag on master branch: that's how 0.39 was released
  3. Create new feature branch from origin master and merge newest upstream release branch or branch local branch created from release tag git checkout -b {name-of-branch} tags/{release-tag}. DO NOT REBASE: it heavily obscures history of our changes and makes it harder to properly revert and redo these changes which is likely occurence for bigger feast updates (expect many breaking changes)
  4. Resolve conflicts and run lint from makefile. In most cases resolving these conflicts will require contacting authors of our internal fixes for context, but as general rule of thumb take newest version of feast and reapply Ki changes when possible/relevant. Any requirements in setup.py should default to newer version (most probably from upstream)
  5. Create PR to origin master with said update branch
  6. Use commit hash to test potential new version basic functionality in feature-store app/feature-store project. NOTE that to test anything you first need to create new ki-features (same feature-store repo) lib version and merge it so it can be used for local feature store tests. Feature store can be tested locally with make build-base-local available command and then adjusting docker files of all containers used in deployment to point to that local image. Local tests do not ensure that such release will work as historically a lot of issues could be seen only in dev (connection bleed, breaking changes with no proper registry migration approach etc.)
  7. Merge to master and include in feature-store for more extensive tests on dev

Alternative approach to updating

  1. Considering small amount of Ki specific changes and potential to introduce hard to track or resolve issues during conflict resolution in merge, there is alternative approach.
  2. Copy over newest release branch or create local one from release tag. Reapply manually all the Ki specific changes on top of it like for example in this commit: https://github.com/Ki-Insurance/feast/pull/32/commits/d4ab29e4249bfc66119b505bc65a461e20ccee42
  3. Merge origin master into aforementioned release branch with Ki changes. Ensure that freshly prepared release version will have priority in resolving conflicts
git merge --strategy=ours origin/master
  1. Advice: Create aforementioned newest version branch and apply Ki changes f.e. 0.39-update then checkout new branch from that one f.e. 0.39-update-merge-test, then merge master into the merge test branch. If everything resolved properly 0.39-update and 0.39-update-merge-test should have no diff but 0.39-update-merge-test will now have a history allowing it to be merged without conflict into origin/master
  2. Merge such prepared version into master. It can be tested on dev deployment of feature-store before merging in this repo
  3. Some considerations for testing and what needs to be done:
  • feature store deployment to dev won't show any issues with communication with models as they are inherently pointing to UAT deployment so to properly check everything works, deploy feature store with new version of feast to UAT then check ki-automation @algoRelease set of tests
  • most probably if there are any breaking or bigger changes, there will be models for which all or some tests fail. General approach is to use new version of ki-features lib in these model deployments as hashes of feast version in feature store deployment and ki feature lib used in models need to be the same. There is overall push for model deployments to use newr version of libraries that no longer require ki-features and in turn are not vulnerable to updates of this feast repo.

Current state

Feast version from upstream: 0.39 (created from release tag as there was no branch)

Ki changes applied on top of feast version:

  1. https://github.com/Ki-Insurance/feast/pull/32/commits/d4ab29e4249bfc66119b505bc65a461e20ccee42
  • expiriation for tables (bytewax materialization specific fix - should be kept as long as bytewax materialziation is used)
  • added handling for date types
  • provide types for on demand features - more in #13 and https://ki-insurance.atlassian.net/browse/DUG-121 - as source code was changed extensively around this logic, it is more of reintroduction of fix in new place
  1. https://github.com/Ki-Insurance/feast/pull/32/commits/a4a90164f3b2c5a660f1e4d022286d507410029c
  • introduction of mode for on-demand features was assuming seamless update but for our specific case there was a problem: definition in registry is kept in protobuf with non-nullable field for mode so it never falled into defaulting cases; this change allows seamless update in our environments without need to manually interfere in or completely recreate registry; should be discarded on next update
  • small change in how async refresh is started; considering how registry refresh is written, it's creating new sql engine (and in turn connection pool) with every refresh; previously it was not a problem because old threads with said engines and connection pools were cleaned right away; with new approach using @asynccontextmanager said previous threads with engines (and connection pools) were not reclaimed automatically leading to connections bleed up to the registry limit
  1. #20
  • our own implementation of async feature retrieval used in python sdk form by feature connector service
  1. #34
  • comments for reasoning in change
  1. https://github.com/Ki-Insurance/feast/pull/36/files
  • fix for when ODF input values typing can't be inferred for internal format transformations





unit-tests integration-tests-and-build java-integration-tests linter Docs Latest Python API License GitHub Release

Overview

Feast (Feature Store) is an open source feature store for machine learning. Feast is the fastest path to manage existing infrastructure to productionize analytic data for model training and online inference.

Feast allows ML platform teams to:

  • Make features consistently available for training and serving by managing an offline store (to process historical data for scale-out batch scoring or model training), a low-latency online store (to power real-time prediction), and a battle-tested feature server (to serve pre-computed features online).
  • Avoid data leakage by generating point-in-time correct feature sets so data scientists can focus on feature engineering rather than debugging error-prone dataset joining logic. This ensure that future feature values do not leak to models during training.
  • Decouple ML from data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval, ensuring models remain portable as you move from training models to serving models, from batch models to realtime models, and from one data infra system to another.

Please see our documentation for more information about the project.

πŸ“ Architecture

The above architecture is the minimal Feast deployment. Want to run the full Feast on Snowflake/GCP/AWS? Click here.

🐣 Getting Started

1. Install Feast

pip install feast

2. Create a feature repository

feast init my_feature_repo
cd my_feature_repo/feature_repo

3. Register your feature definitions and set up your feature store

feast apply

4. Explore your data in the web UI (experimental)

Web UI

feast ui

5. Build a training dataset

from feast import FeatureStore
import pandas as pd
from datetime import datetime

entity_df = pd.DataFrame.from_dict({
    "driver_id": [1001, 1002, 1003, 1004],
    "event_timestamp": [
        datetime(2021, 4, 12, 10, 59, 42),
        datetime(2021, 4, 12, 8,  12, 10),
        datetime(2021, 4, 12, 16, 40, 26),
        datetime(2021, 4, 12, 15, 1 , 12)
    ]
})

store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
    entity_df=entity_df,
    features = [
        'driver_hourly_stats:conv_rate',
        'driver_hourly_stats:acc_rate',
        'driver_hourly_stats:avg_daily_trips'
    ],
).to_df()

print(training_df.head())

# Train model
# model = ml.fit(training_df)
            event_timestamp  driver_id  conv_rate  acc_rate  avg_daily_trips
0 2021-04-12 08:12:10+00:00       1002   0.713465  0.597095              531
1 2021-04-12 10:59:42+00:00       1001   0.072752  0.044344               11
2 2021-04-12 15:01:12+00:00       1004   0.658182  0.079150              220
3 2021-04-12 16:40:26+00:00       1003   0.162092  0.309035              959

6. Load feature values into your online store

CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME
Materializing feature view driver_hourly_stats from 2021-04-14 to 2021-04-15 done!

7. Read online features at low latency

from pprint import pprint
from feast import FeatureStore

store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
    features=[
        'driver_hourly_stats:conv_rate',
        'driver_hourly_stats:acc_rate',
        'driver_hourly_stats:avg_daily_trips'
    ],
    entity_rows=[{"driver_id": 1001}]
).to_dict()

pprint(feature_vector)

# Make prediction
# model.predict(feature_vector)
{
    "driver_id": [1001],
    "driver_hourly_stats__conv_rate": [0.49274],
    "driver_hourly_stats__acc_rate": [0.92743],
    "driver_hourly_stats__avg_daily_trips": [72]
}

πŸ“¦ Functionality and Roadmap

The list below contains the functionality that contributors are planning to develop for Feast.

πŸŽ“ Important Resources

Please refer to the official documentation at Documentation

πŸ‘‹ Contributing

Feast is a community project and is still under active development. Please have a look at our contributing and development guides if you want to contribute to the project:

✨ Contributors

Thanks goes to these incredible people:

About

Feature Store for Machine Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 75.0%
  • TypeScript 8.1%
  • Java 7.3%
  • Go 6.3%
  • HCL 1.0%
  • Makefile 0.9%
  • Other 1.4%