Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fixed SparkSource docstrings so it wouldn't used inhereted class docstrings #4722

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

dandawg
Copy link
Contributor

@dandawg dandawg commented Oct 30, 2024

What this PR does / why we need it:

This PR adds a docstring to the SparkSource class and its __init__ method.

The current behavior is that SparkSource inherits the docstrings from the DataSource class. This is wrong, as the DataSource class has different functionality and parameters than does SparkSource.

When calling help(SparkSource) or inspect.getdoc(SparkSource), the DataSource docstring is (wrongfully) returned. Also, when calling SparkSource.__doc__ or SparkSource.__init__.__doc__ directly, no docstring is returned.

In [1]: from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import SparkSource
In [2]: import inspect
In [3]: print(inspect.getdoc(SparkSource))
DataSource that can be used to source features.

Args:
    name: Name of data source, which should be unique within a project
    timestamp_field (optional): Event timestamp field used for point-in-time joins of
        feature values.
    created_timestamp_column (optional): Timestamp column indicating when the row
        was created, used for deduplicating rows.
    field_mapping (optional): A dictionary mapping of column names in this data
        source to feature names in a feature table or view. Only used for feature
        columns, not entity or timestamp columns.
    description (optional) A human-readable description.
    tags (optional): A dictionary of key-value pairs to store arbitrary metadata.
    owner (optional): The owner of the data source, typically the email of the primary
        maintainer.
    date_partition_column (optional): Timestamp column used for partitioning. Not supported by all offline stores.

^ There are args missing! They differ from the args in the source code and in the reference documentation.

With the update from this PR:

In [1]: from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import SparkSource
In [2]: import inspect
In [3]: print(inspect.getdoc(SparkSource))
A SparkSource object defines a data source that a Spark offline store can use
In [4]: print(SparkSource.__init__.__doc__)
Creates a SparkSource object.

        Args:
            name: The name of the data source, which should be unique within a project.
            table: The name of a Spark table.
            query: The query to be executed in Spark.
            path: The path to file data.
            file_format: The format of the file data.
            created_timestamp_column: Timestamp column indicating when the row
                was created, used for deduplicating rows.
            field_mapping: A dictionary mapping of column names in this data
                source to feature names in a feature table or view.
            description: A human-readable description.
            tags: A dictionary of key-value pairs to store arbitrary metadata.
            owner: The owner of the DataSource, typically the email of the primary
                maintainer.
            timestamp_field: Event timestamp field used for point-in-time joins of
                feature values.

Which issue(s) this PR fixes:

Misc

@dandawg dandawg requested a review from a team as a code owner October 30, 2024 19:16
@dandawg dandawg changed the title bug: fixed SparkSource docstrings so it wouldn't used inhereted class docstrings bug: Fixed SparkSource docstrings so it wouldn't used inhereted class docstrings Oct 30, 2024
@dandawg dandawg changed the title bug: Fixed SparkSource docstrings so it wouldn't used inhereted class docstrings fix: Fixed SparkSource docstrings so it wouldn't used inhereted class docstrings Oct 30, 2024
@dandawg
Copy link
Contributor Author

dandawg commented Oct 30, 2024

I've found other data sources with similar issues. I figured I would submit this one first to see if there are any patterns I need to follow. Once everything is good, I intend to fix the others.

@dmartinol dmartinol enabled auto-merge (squash) October 31, 2024 16:44
Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

@dmartinol dmartinol merged commit 32e6aa1 into feast-dev:master Nov 5, 2024
25 of 27 checks passed
shuchu pushed a commit to shuchu/feast that referenced this pull request Nov 21, 2024
… docstrings (feast-dev#4722)

Fixed docstrings so it wouldn't used inhereted class docstrings

Signed-off-by: dandawg <[email protected]>
franciscojavierarceo pushed a commit that referenced this pull request Dec 5, 2024
# [0.42.0](v0.41.0...v0.42.0) (2024-12-05)

### Bug Fixes

* Add adapters for sqlite datetime conversion ([#4797](#4797)) ([e198b17](e198b17))
* Added grpcio extras to default feature-server image ([#4737](#4737)) ([e9cd373](e9cd373))
* Changing node version in release ([7089918](7089918))
* Feast create empty online table when FeatureView attribute online=False ([#4666](#4666)) ([237c453](237c453))
* Fix db store types in Operator CRD ([#4798](#4798)) ([f09339e](f09339e))
* Fix the config issue for postgres ([#4776](#4776)) ([a36f7e5](a36f7e5))
* Fixed example materialize-incremental and improved explanation ([#4734](#4734)) ([ca8a7ab](ca8a7ab))
* Fixed SparkSource docstrings so it wouldn't used inhereted class docstrings ([#4722](#4722)) ([32e6aa1](32e6aa1))
* Fixing PGVector integration tests ([#4778](#4778)) ([88a0320](88a0320))
* Incorrect type passed to assert_permissions in materialize endpoints ([#4727](#4727)) ([b72c2da](b72c2da))
* Issue of DataSource subclasses using parent abstract class docstrings ([#4730](#4730)) ([b24acd5](b24acd5))
* Operator envVar positioning & tls.SecretRef.Name ([#4806](#4806)) ([1115d96](1115d96))
* Populates project created_time correctly according to created ti… ([#4686](#4686)) ([a61b93c](a61b93c))
* Reduce feast-server container image size & fix dev image build ([#4781](#4781)) ([ccc9aea](ccc9aea))
* Removed version func from feature_store.py ([#4748](#4748)) ([f902bb9](f902bb9))
* Support registry instantiation for read-only users ([#4719](#4719)) ([ca3d3c8](ca3d3c8))
* Syntax Error in BigQuery While Retrieving Columns that Start wit… ([#4713](#4713)) ([60fbc62](60fbc62))
* Update release version in a pertinent Operator file ([#4708](#4708)) ([764a8a6](764a8a6))

### Features

* Add api contract to fastapi docs ([#4721](#4721)) ([1a165c7](1a165c7))
* Add Couchbase as an online store ([#4637](#4637)) ([824859b](824859b))
* Add Operator support for spec.feastProject & status.applied fields ([#4656](#4656)) ([430ac53](430ac53))
* Add services functionality to Operator ([#4723](#4723)) ([d1d80c0](d1d80c0))
* Add TLS support to the Operator ([#4796](#4796)) ([a617a6c](a617a6c))
* Added feast Go operator db stores support ([#4771](#4771)) ([3302363](3302363))
* Added support for setting env vars in feast services in feast controller  ([#4739](#4739)) ([84b24b5](84b24b5))
* Adding docs outlining native Python transformations on singletons ([#4741](#4741)) ([0150278](0150278))
* Adding first feast operator e2e test. ([#4791](#4791)) ([8339f8d](8339f8d))
* Adding github action to run the operator end-to-end tests. ([#4762](#4762)) ([d8ccb00](d8ccb00))
* Adding ssl support for registry server. ([#4718](#4718)) ([ccf7a55](ccf7a55))
* Adding SSL support for the React UI server and feast UI command. ([#4736](#4736)) ([4a89252](4a89252))
* Adding support for native Python transformations on a single dictionary  ([#4724](#4724)) ([9bbc1c6](9bbc1c6))
* Adding TLS support for offline server. ([#4744](#4744)) ([5d8d03f](5d8d03f))
* Building the feast image ([#4775](#4775)) ([6635dde](6635dde))
* File persistence definition and implementation ([#4742](#4742)) ([3bad4a1](3bad4a1))
* Object store persistence in operator ([#4758](#4758)) ([0ae86da](0ae86da))
* OIDC authorization in Feast Operator ([#4801](#4801)) ([eb111d6](eb111d6))
* Operator will create k8s serviceaccount for each feast service ([#4767](#4767)) ([cde5760](cde5760))
* Printing more verbose logs when we start the offline server  ([#4660](#4660)) ([9d8d3d8](9d8d3d8))
* PVC configuration and impl ([#4750](#4750)) ([785a190](785a190))
* Qdrant vectorstore support ([#4689](#4689)) ([86573d2](86573d2))
* RBAC Authorization in Feast Operator ([#4786](#4786)) ([0ef5acc](0ef5acc))
* Support for nested timestamp fields in Spark Offline store ([#4740](#4740)) ([d4d94f8](d4d94f8))
* Update the go feature server from Expedia code repo. ([#4665](#4665)) ([6406625](6406625))
* Updated feast Go operator db stores ([#4809](#4809)) ([2c5a6b5](2c5a6b5))
* Updated sample secret following review ([#4811](#4811)) ([dc9f825](dc9f825))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants