Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add week_of_year Spark function #5941

Closed
wants to merge 1 commit into from

Conversation

majian4work
Copy link
Contributor

@majian4work majian4work commented Aug 1, 2023

Presto's week_of_year function differs from Spark's in return type: BIGINT vs. INTEGER.

Copy-paste Presto's implementation and change the return type.

Spark's implementation: https://github.com/apache/spark/blob/v3.2.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L868

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 1, 2023
@netlify
Copy link

netlify bot commented Aug 1, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit d8318ab
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/64d1a8dccb787c00084de4e9

@majian4work
Copy link
Contributor Author

@rui-mo

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put Spark's implementation to the PR description?

velox/docs/functions/spark/datetime.rst Outdated Show resolved Hide resolved
velox/functions/sparksql/Register.cpp Outdated Show resolved Hide resolved
velox/functions/sparksql/tests/DateTimeFunctionsTest.cpp Outdated Show resolved Hide resolved
@majian4work
Copy link
Contributor Author

I0807 10:11:07.190543 1754778 ExpressionFuzzer.cpp:1313] ==============================> Started iteration 74530 (seed: 1635755765)
I0807 10:11:07.344303 1754778 ExpressionVerifier.cpp:91] Executing expression 0 : week_of_year("c0")
I0807 10:11:07.344316 1754778 ExpressionVerifier.cpp:31] 1 vectors as input:
I0807 10:11:07.344318 1754778 ExpressionVerifier.cpp:33] [FLAT TIMESTAMP: 100 elements, 12 nulls]
I0807 10:11:07.344692 1754778 ExpressionVerifier.cpp:79] All results match.
I0807 10:11:07.344715 1754778 ExpressionFuzzer.cpp:1358] ==============================> Done with iteration 74530
I0807 10:11:07.344756 1754778 ExpressionFuzzer.cpp:1093] ==============================> Top 1 by number of rows processed
I0807 10:11:07.344761 1754778 ExpressionFuzzer.cpp:1095] Format: functionName numTimesSelected proportionOfTimesSelected numProcessedRows
I0807 10:11:07.344763 1754778 ExpressionFuzzer.cpp:1099] week_of_year 74531 100.00% 1142417
I0807 10:11:07.344866 1754778 ExpressionFuzzer.cpp:1105] ==============================> Bottom 1 by number of rows processed
I0807 10:11:07.344869 1754778 ExpressionFuzzer.cpp:1107] Format: functionName numTimesSelected proportionOfTimesSelected numProcessedRows
I0807 10:11:07.344873 1754778 ExpressionFuzzer.cpp:1112] week_of_year 74531 100.00% 1142417
I0807 10:11:07.344879 1754778 ExpressionFuzzer.cpp:1125] ==============================> All stats sorted by number of times the function was chosen
I0807 10:11:07.344884 1754778 ExpressionFuzzer.cpp:1127] Format: functionName numTimesSelected proportionOfTimesSelected numProcessedRows
I0807 10:11:07.344889 1754778 ExpressionFuzzer.cpp:1131] week_of_year 74531 100.00% 1142417
[==========] Running 0 tests from 0 test suites.
[==========] 0 tests from 0 test suites ran. (0 ms total)
[ PASSED ] 0 tests.

@majian4work
Copy link
Contributor Author

@mbasmanova Could you help to review this pr?

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ma-Jian1 Would you review contributing guide and update accordingly. Specifically, make sure the PR title and description are in compliance.

@majian4work majian4work changed the title week_of_year for spark Add week_of_year Spark function Aug 8, 2023
@majian4work
Copy link
Contributor Author

@Ma-Jian1 Would you review contributing guide and update accordingly. Specifically, make sure the PR title and description are in compliance.

Sorry for that.
Update PR title and document according to the guide.

@@ -52,6 +52,13 @@ These functions support TIMESTAMP and DATE input types.
Returns null if ``string`` does not match ``format`` or if ``format``
is invalid.

.. function:: week(x) -> integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title says "week_of_year", but this doc says "week". Which is correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed to week_of_year

.. function:: week(x) -> integer

Returns the `ISO-Week`_ of the year from x. The value ranges from ``1`` to ``53``.
A week is considered to start on a Monday and week 1 is the first week with >3 days.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you double check that this matches Spark's implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some extra ut that copy from Spark UT, and validate the results.
From https://github.com/apache/spark/blob/v3.2.2/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3240-L3241, It also follows ISO 8601

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. There are some tooling issues blocking me from merging this PR. These should be resolved sometime tomorrow.

@majian4work
Copy link
Contributor Author

@mbasmanova Can this PR be merged?

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor

Let me try. Would you rebase?

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 15e1782.

@conbench-facebook
Copy link

Conbench analyzed the 1 benchmark run on commit 15e17823.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

unigof pushed a commit to unigof/velox that referenced this pull request Aug 18, 2023
Summary:
Presto's `week_of_year` function differs from Spark's in return type: BIGINT vs. INTEGER.

Copy-paste Presto's implementation and change the return type.

Spark's implementation: https://github.com/apache/spark/blob/v3.2.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L868

Pull Request resolved: facebookincubator#5941

Reviewed By: amitkdutta

Differential Revision: D48224232

Pulled By: mbasmanova

fbshipit-source-id: 19313b0294a8aa714b017deb4623ba1422084a35
ericyuliu pushed a commit to ericyuliu/velox that referenced this pull request Oct 12, 2023
Summary:
Presto's `week_of_year` function differs from Spark's in return type: BIGINT vs. INTEGER.

Copy-paste Presto's implementation and change the return type.

Spark's implementation: https://github.com/apache/spark/blob/v3.2.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L868

Pull Request resolved: facebookincubator#5941

Reviewed By: amitkdutta

Differential Revision: D48224232

Pulled By: mbasmanova

fbshipit-source-id: 19313b0294a8aa714b017deb4623ba1422084a35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants