Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1625377 Raise NotImplementedError for timedelta #2102

Merged
merged 2 commits into from
Aug 17, 2024

Conversation

sfc-gh-azhan
Copy link
Collaborator

@sfc-gh-azhan sfc-gh-azhan commented Aug 15, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1625377 Raise NotImplementedError for timedelta. This PR also supported SNOW-1620417 cache_result.

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
  3. Please describe how your code solves the related issue.

    Please write a short description of how your code change solves the related issue.

My Summary

The main goal for this PR is to make sure we raise NotImplementedError from a SnowflakeQueryCompiler's public method when a dataframe or series contain Timedelta columns call it and it hasn't fully supported Timedelta type. Then in the coming weeks, we can remove those errors once we have implemented and tested those methods. The steps I did in this PR are:

  1. Change data_column_types and index_column_types from Optional to required in InternalFrame.create method, so all usage of this method should explicitly specify these two arguments. It will help us to avoid bugs. I updated all InternalFrame.create methods and make sure the arguments are correctly specified. For those I'm not sure or tracked in other jira (e.g., binary ops), I set them to
data_column_types=None,
index_column_types=None,

In the coming weeks, we should replace those to be the right values.

  1. For the above under construction cases, I make sure add _raise_not_implemented_error_for_timedelta into the SnowflakeQueryCompiler's methods that call those cases. In the coming weeks, we can try to remove all _raise_not_implemented_error_for_timedelta by supporting timedelta there with good test coverage.

  2. For the SnowflakeQueryCompiler's methods that won't change pandas types, I make sure the type has been persisted correctly. To achieve that, I add two assertions: 1) I added a check @snowpark_pandas_type_immutable_check as a decorator to them to validate the types are not changed; 2) improved the assertion message in frame.py to make sure we got sufficient error message if data_column_types and index_column_types is not valid.

  3. Lastly, I added some test cases for Timedelta types as examples and show it works on some changes, e.g., copy, cache_result.

Once this PR is in, our goal towards Timedelta GA become quite concrete:

  • we should remove all _raise_not_implemented_error_for_timedelta
  • we should update all:
 data_column_types=None,
 index_column_types=None,

Once those are done, we should be confident say Timedelta is supported in Snowpark pandas.

Copilot Summary

This pull request includes changes to improve type handling for data and index columns across various utility functions and internal frame operations in the snowflake/snowpark/modin/plugin package. The most important changes include adding type information to various internal frame functions and ensuring consistency in type handling.

Enhancements to type handling:

  • Added data_column_types and index_column_types to the return values and parameters of several functions in aggregation_utils.py, apply_utils.py, concat_utils.py, and cumulative_utils.py to ensure type information is preserved. (src/snowflake/snowpark/modin/plugin/_internal/aggregation_utils.py [1] [2]; src/snowflake/snowpark/modin/plugin/_internal/apply_utils.py [3]; src/snowflake/snowpark/modin/plugin/_internal/concat_utils.py [4] [5] [6] [7] [8] [9]; src/snowflake/snowpark/modin/plugin/_internal/cumulative_utils.py [10]

Consistency improvements:

  • Updated frame.py to include type information in various methods, ensuring that type consistency is maintained when manipulating internal frames. This includes the create method, get_snowflake_identifiers_and_pandas_labels_from_levels, and other internal frame methods. (src/snowflake/snowpark/modin/plugin/_internal/frame.py [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

Assertion improvements:

Generator utilities update:

@sfc-gh-azhan sfc-gh-azhan added NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs labels Aug 15, 2024
@sfc-gh-azhan sfc-gh-azhan force-pushed the azhan-type-immutable-check-1625377 branch 6 times, most recently from 82cad35 to 19cb4b8 Compare August 16, 2024 17:05
@sfc-gh-azhan sfc-gh-azhan removed the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label Aug 16, 2024
@sfc-gh-azhan sfc-gh-azhan marked this pull request as ready for review August 16, 2024 17:06
@sfc-gh-azhan sfc-gh-azhan requested a review from a team as a code owner August 16, 2024 17:06
@sfc-gh-azhan sfc-gh-azhan force-pushed the azhan-type-immutable-check-1625377 branch from 19cb4b8 to a3fd533 Compare August 16, 2024 17:56
@sfc-gh-azhan sfc-gh-azhan force-pushed the azhan-type-immutable-check-1625377 branch from a3fd533 to 24dad97 Compare August 16, 2024 19:33
@sfc-gh-azhan sfc-gh-azhan force-pushed the azhan-type-immutable-check-1625377 branch from 2c2de22 to a6dba0b Compare August 16, 2024 23:15
Copy link
Contributor

@sfc-gh-nkrishna sfc-gh-nkrishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sfc-gh-azhan sfc-gh-azhan merged commit 8862b80 into main Aug 17, 2024
35 checks passed
@sfc-gh-azhan sfc-gh-azhan deleted the azhan-type-immutable-check-1625377 branch August 17, 2024 00:19
@github-actions github-actions bot locked and limited conversation to collaborators Aug 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants