Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saved_dataset for spark offline store can be accessed only within the scope of the spark session, where it was created. #3644

Closed
nadejdaSuraeva opened this issue Jun 5, 2023 · 0 comments · Fixed by #3645
Labels
kind/feature New feature or request

Comments

@nadejdaSuraeva
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I would like to have a possibility to use the data of the registered saved_dataset in a different spark session. Now I have only a name in the feast registry without the data if I create a new spark session.

Part of persist function:

"""
Run the retrieval and persist the results in the same offline store used for read.
Please note the persisting is done only within the scope of the spark session.
"""
assert isinstance(storage, SavedDatasetSparkStorage)
table_name = storage.spark_options.table
if not table_name:
    raise ValueError("Cannot persist, table_name is not defined")
self.to_spark_df().createOrReplaceTempView(table_name)

Describe the solution you'd like
Add possibility to save dataset as a table, for example when Spark Session config is included info about remote storage (hive, s3 path, etc)

Describe alternatives you've considered
Add an optional parameter for SparkOptions, which allows to save dataset as a table in any spark session configurations.

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
1 participant