Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding support for native Python transformations on a single dictionary #4724

Merged
merged 5 commits into from
Nov 6, 2024

Conversation

franciscojavierarceo
Copy link
Member

@franciscojavierarceo franciscojavierarceo commented Oct 31, 2024

What this PR does / why we need it:

This PR adds support for native Python transformations on a dictionary of a singleton (i.e., what most Data Scientists would call a single row).

As an example, Data Scientists can now transform features using an On Demand Transform like so:

@on_demand_feature_view(
    sources=[driver_stats_fv[["conv_rate", "acc_rate"]]],
    schema=[
        Field(name="conv_rate_plus_acc_python_singleton", dtype=Float64)
    ],
    mode="python",
    singleton=True,
)
def python_singleton_view(inputs: dict[str, Any]) -> dict[str, Any]:
    output: dict[str, Any] = dict(conv_rate_plus_acc_python=float("-inf"))
    output["conv_rate_plus_acc_python_singleton"] = (
        inputs["conv_rate"] + inputs["acc_rate"]
    )
    return output

Compare this to the previous logic to do the transformation which looks like:

@on_demand_feature_view(
    sources=[driver_stats_fv[["conv_rate", "acc_rate"]]],
    schema=[
        Field(name="conv_rate_plus_val", dtype=Float64),
    ],
    mode="python",
)
def python_demo_view(inputs: dict[str, Any]) -> dict[str, Any]:
    output: dict[str, Any] = {
        "conv_rate_plus_val": [
            conv_rate + acc_rate
            for conv_rate, acc_rate in zip(
                inputs["conv_rate"], inputs["acc_rate"]
            )
        ],
    }
    return output

Which issue(s) this PR fixes:

#4075

Misc

Added documentation in this PR: #4741

…..not an ideal solution

Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
Signed-off-by: Francisco Javier Arceo <[email protected]>
@franciscojavierarceo
Copy link
Member Author

FYI @robhowley

@robhowley
Copy link
Contributor

is there no way to do any kind of type inference in the codepath to determine it's a singleton or leave it up to the person implementing the transformation to know what data they're expecting and raise errors accordingly? just kinda lean more into duck typing. strikes me as a little unusual to have to pass a bool in to specify what the type of my data is. im assuming there's context on this im missing about how the subsequent transformations will interact w proto stuff

@franciscojavierarceo
Copy link
Member Author

That's the way I originally started do it, the problem is that requires explicit type annotations in the ODFV definition and I think that's too strict of a requirement.

@franciscojavierarceo
Copy link
Member Author

I definitely agree this Boolean is not the preferred option but it was the what made the most sense to me given that other sorts of type inference start to become brittle once you allow for different input types.

@robhowley
Copy link
Contributor

i was thinking more like relax the type annotation be dict[str, Any] instead of dict[str, List[Any]]. it's the transformation functions problem to handle it.

@franciscojavierarceo
Copy link
Member Author

franciscojavierarceo commented Nov 5, 2024

Yeah that could be done today because type annotations can still be ignored by users but I don't think that's a good enough experience for Data Scientists. I think this makes the transformation really easy and it has rational defaults.

@robhowley
Copy link
Contributor

and defining it as a Union of the two types wasn't sufficient?

output_dict: dict[str, Any] = self.feature_transformation.transform(
feature_dict
)
if self.singleton and self.mode == "python":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robhowley see here

inferred_features = self.feature_transformation.infer_features(
self._construct_random_input()
)
random_input = self._construct_random_input(singleton=self.singleton)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robhowley and here

@@ -663,20 +681,25 @@ def _construct_random_input(self) -> dict[str, list[Any]]:
ValueType.BOOL_LIST: [[True]],
ValueType.UNIX_TIMESTAMP_LIST: [[_utc_now()]],
}
if singleton:
rand_dict_value = {k: rand_dict_value[k][0] for k in rand_dict_value}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robhowley and here

@@ -570,6 +570,8 @@ def _augment_response_with_on_demand_transforms(
proto_values.append(
python_values_to_proto_values(
feature_vector
if isinstance(feature_vector, list)
else [feature_vector]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robhowley additional special logic

@franciscojavierarceo
Copy link
Member Author

@robhowley I tagged you in some places that outline some of why I ended up doing it this way.

@HaoXuAI
Copy link
Collaborator

HaoXuAI commented Nov 6, 2024

I'm also not following why it can't be just a type check. Adding this singleton feels complex the logic.
also since it's dict[str, Any], if you enforce the value to be List, it could be list of list of list..., so you can't get the logic abstracted.

@franciscojavierarceo
Copy link
Member Author

Depends on the type check. You may have input features that are mixed that you want to create a feature out of.

For example, if I did a type check to ensure all inputs were not lists, they will not accommodate the case I just mentioned.

Copy link
Collaborator

@HaoXuAI HaoXuAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't block it to go alive, but I think it would be great if there is some other way for this instead of this "singleton" parameter :)

# This flattens the list of elements to extract the first one
# in the case of a singleton element, it takes the value directly
# in the case of a list of lists, it takes the first list
input_dict = {k: v[0] for k, v in input_dict.items()}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v[0] might cause error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't but I'll add a test for it

@franciscojavierarceo
Copy link
Member Author

Do you have suggestion? I had dawned discussion with @robhowley indicating that this was not my preferred solution but simply the one I ended up having to settle on due to the tradeoffs.

@franciscojavierarceo franciscojavierarceo merged commit 9bbc1c6 into master Nov 6, 2024
31 checks passed
@HaoXuAI
Copy link
Collaborator

HaoXuAI commented Nov 7, 2024

Do you have suggestion? I had dawned discussion with @robhowley indicating that this was not my preferred solution but simply the one I ended up having to settle on due to the tradeoffs.

I don't have any direct suggestion on this yet. But I believe after refactoring rebuild the Transformation entirely it should work in some structured way :)

shuchu pushed a commit to shuchu/feast that referenced this pull request Nov 21, 2024
…ctionary (feast-dev#4724)

* feat: Adding support for native Python transformations on a dictionary

Signed-off-by: Francisco Javier Arceo <[email protected]>

* Updated type checking and added exception handling to try basic dict...not an ideal solution

Signed-off-by: Francisco Javier Arceo <[email protected]>

* updated tests

Signed-off-by: Francisco Javier Arceo <[email protected]>

* adding protos

Signed-off-by: Francisco Javier Arceo <[email protected]>

* fixed unit test

Signed-off-by: Francisco Javier Arceo <[email protected]>

---------

Signed-off-by: Francisco Javier Arceo <[email protected]>
franciscojavierarceo pushed a commit that referenced this pull request Dec 5, 2024
# [0.42.0](v0.41.0...v0.42.0) (2024-12-05)

### Bug Fixes

* Add adapters for sqlite datetime conversion ([#4797](#4797)) ([e198b17](e198b17))
* Added grpcio extras to default feature-server image ([#4737](#4737)) ([e9cd373](e9cd373))
* Changing node version in release ([7089918](7089918))
* Feast create empty online table when FeatureView attribute online=False ([#4666](#4666)) ([237c453](237c453))
* Fix db store types in Operator CRD ([#4798](#4798)) ([f09339e](f09339e))
* Fix the config issue for postgres ([#4776](#4776)) ([a36f7e5](a36f7e5))
* Fixed example materialize-incremental and improved explanation ([#4734](#4734)) ([ca8a7ab](ca8a7ab))
* Fixed SparkSource docstrings so it wouldn't used inhereted class docstrings ([#4722](#4722)) ([32e6aa1](32e6aa1))
* Fixing PGVector integration tests ([#4778](#4778)) ([88a0320](88a0320))
* Incorrect type passed to assert_permissions in materialize endpoints ([#4727](#4727)) ([b72c2da](b72c2da))
* Issue of DataSource subclasses using parent abstract class docstrings ([#4730](#4730)) ([b24acd5](b24acd5))
* Operator envVar positioning & tls.SecretRef.Name ([#4806](#4806)) ([1115d96](1115d96))
* Populates project created_time correctly according to created ti… ([#4686](#4686)) ([a61b93c](a61b93c))
* Reduce feast-server container image size & fix dev image build ([#4781](#4781)) ([ccc9aea](ccc9aea))
* Removed version func from feature_store.py ([#4748](#4748)) ([f902bb9](f902bb9))
* Support registry instantiation for read-only users ([#4719](#4719)) ([ca3d3c8](ca3d3c8))
* Syntax Error in BigQuery While Retrieving Columns that Start wit… ([#4713](#4713)) ([60fbc62](60fbc62))
* Update release version in a pertinent Operator file ([#4708](#4708)) ([764a8a6](764a8a6))

### Features

* Add api contract to fastapi docs ([#4721](#4721)) ([1a165c7](1a165c7))
* Add Couchbase as an online store ([#4637](#4637)) ([824859b](824859b))
* Add Operator support for spec.feastProject & status.applied fields ([#4656](#4656)) ([430ac53](430ac53))
* Add services functionality to Operator ([#4723](#4723)) ([d1d80c0](d1d80c0))
* Add TLS support to the Operator ([#4796](#4796)) ([a617a6c](a617a6c))
* Added feast Go operator db stores support ([#4771](#4771)) ([3302363](3302363))
* Added support for setting env vars in feast services in feast controller  ([#4739](#4739)) ([84b24b5](84b24b5))
* Adding docs outlining native Python transformations on singletons ([#4741](#4741)) ([0150278](0150278))
* Adding first feast operator e2e test. ([#4791](#4791)) ([8339f8d](8339f8d))
* Adding github action to run the operator end-to-end tests. ([#4762](#4762)) ([d8ccb00](d8ccb00))
* Adding ssl support for registry server. ([#4718](#4718)) ([ccf7a55](ccf7a55))
* Adding SSL support for the React UI server and feast UI command. ([#4736](#4736)) ([4a89252](4a89252))
* Adding support for native Python transformations on a single dictionary  ([#4724](#4724)) ([9bbc1c6](9bbc1c6))
* Adding TLS support for offline server. ([#4744](#4744)) ([5d8d03f](5d8d03f))
* Building the feast image ([#4775](#4775)) ([6635dde](6635dde))
* File persistence definition and implementation ([#4742](#4742)) ([3bad4a1](3bad4a1))
* Object store persistence in operator ([#4758](#4758)) ([0ae86da](0ae86da))
* OIDC authorization in Feast Operator ([#4801](#4801)) ([eb111d6](eb111d6))
* Operator will create k8s serviceaccount for each feast service ([#4767](#4767)) ([cde5760](cde5760))
* Printing more verbose logs when we start the offline server  ([#4660](#4660)) ([9d8d3d8](9d8d3d8))
* PVC configuration and impl ([#4750](#4750)) ([785a190](785a190))
* Qdrant vectorstore support ([#4689](#4689)) ([86573d2](86573d2))
* RBAC Authorization in Feast Operator ([#4786](#4786)) ([0ef5acc](0ef5acc))
* Support for nested timestamp fields in Spark Offline store ([#4740](#4740)) ([d4d94f8](d4d94f8))
* Update the go feature server from Expedia code repo. ([#4665](#4665)) ([6406625](6406625))
* Updated feast Go operator db stores ([#4809](#4809)) ([2c5a6b5](2c5a6b5))
* Updated sample secret following review ([#4811](#4811)) ([dc9f825](dc9f825))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants