-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1625377 Raise NotImplementedError for timedelta #2102
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sfc-gh-azhan
added
NO-CHANGELOG-UPDATES
This pull request does not need to update CHANGELOG.md
NO-PANDAS-CHANGEDOC-UPDATES
This PR does not update Snowpark pandas docs
labels
Aug 15, 2024
sfc-gh-azhan
force-pushed
the
azhan-type-immutable-check-1625377
branch
6 times, most recently
from
August 16, 2024 17:05
82cad35
to
19cb4b8
Compare
sfc-gh-azhan
removed
the
NO-CHANGELOG-UPDATES
This pull request does not need to update CHANGELOG.md
label
Aug 16, 2024
sfc-gh-azhan
force-pushed
the
azhan-type-immutable-check-1625377
branch
from
August 16, 2024 17:56
19cb4b8
to
a3fd533
Compare
sfc-gh-azhan
force-pushed
the
azhan-type-immutable-check-1625377
branch
from
August 16, 2024 19:33
a3fd533
to
24dad97
Compare
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/_internal/indexing_utils.py
Outdated
Show resolved
Hide resolved
src/snowflake/snowpark/modin/plugin/compiler/snowflake_query_compiler.py
Outdated
Show resolved
Hide resolved
sfc-gh-mvashishtha
approved these changes
Aug 16, 2024
sfc-gh-azhan
force-pushed
the
azhan-type-immutable-check-1625377
branch
from
August 16, 2024 23:15
2c2de22
to
a6dba0b
Compare
sfc-gh-nkrishna
approved these changes
Aug 16, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1625377 Raise NotImplementedError for timedelta. This PR also supported SNOW-1620417 cache_result.
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Please write a short description of how your code change solves the related issue.
My Summary
The main goal for this PR is to make sure we raise
NotImplementedError
from a SnowflakeQueryCompiler's public method when a dataframe or series contain Timedelta columns call it and it hasn't fully supported Timedelta type. Then in the coming weeks, we can remove those errors once we have implemented and tested those methods. The steps I did in this PR are:data_column_types
andindex_column_types
fromOptional
to required inInternalFrame.create
method, so all usage of this method should explicitly specify these two arguments. It will help us to avoid bugs. I updated allInternalFrame.create
methods and make sure the arguments are correctly specified. For those I'm not sure or tracked in other jira (e.g., binary ops), I set them toIn the coming weeks, we should replace those to be the right values.
For the above under construction cases, I make sure add
_raise_not_implemented_error_for_timedelta
into the SnowflakeQueryCompiler's methods that call those cases. In the coming weeks, we can try to remove all_raise_not_implemented_error_for_timedelta
by supporting timedelta there with good test coverage.For the SnowflakeQueryCompiler's methods that won't change pandas types, I make sure the type has been persisted correctly. To achieve that, I add two assertions: 1) I added a check
@snowpark_pandas_type_immutable_check
as a decorator to them to validate the types are not changed; 2) improved the assertion message in frame.py to make sure we got sufficient error message ifdata_column_types
andindex_column_types
is not valid.Lastly, I added some test cases for Timedelta types as examples and show it works on some changes, e.g.,
copy
,cache_result
.Once this PR is in, our goal towards Timedelta GA become quite concrete:
_raise_not_implemented_error_for_timedelta
Once those are done, we should be confident say Timedelta is supported in Snowpark pandas.
Copilot Summary
This pull request includes changes to improve type handling for data and index columns across various utility functions and internal frame operations in the
snowflake/snowpark/modin/plugin
package. The most important changes include adding type information to various internal frame functions and ensuring consistency in type handling.Enhancements to type handling:
data_column_types
andindex_column_types
to the return values and parameters of several functions inaggregation_utils.py
,apply_utils.py
,concat_utils.py
, andcumulative_utils.py
to ensure type information is preserved. (src/snowflake/snowpark/modin/plugin/_internal/aggregation_utils.py
[1] [2];src/snowflake/snowpark/modin/plugin/_internal/apply_utils.py
[3];src/snowflake/snowpark/modin/plugin/_internal/concat_utils.py
[4] [5] [6] [7] [8] [9];src/snowflake/snowpark/modin/plugin/_internal/cumulative_utils.py
[10]Consistency improvements:
frame.py
to include type information in various methods, ensuring that type consistency is maintained when manipulating internal frames. This includes thecreate
method,get_snowflake_identifiers_and_pandas_labels_from_levels
, and other internal frame methods. (src/snowflake/snowpark/modin/plugin/_internal/frame.py
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]Assertion improvements:
frame.py
to provide clearer error messages when the lengths of type lists do not match the lengths of corresponding identifier lists. (src/snowflake/snowpark/modin/plugin/_internal/frame.py
src/snowflake/snowpark/modin/plugin/_internal/frame.pyL86-R98)Generator utilities update:
generator_utils.py
to include type information when creating query compilers from Snowpark dataframes. (src/snowflake/snowpark/modin/plugin/_internal/generator_utils.py
src/snowflake/snowpark/modin/plugin/_internal/generator_utils.pyR104-R105)