Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support aggregate functions in Eval expressions #757

Closed
wants to merge 1 commit into from

Conversation

LantaoJin
Copy link
Member

Description

Aggregate functions could work in select clause even there is no group by as long as all items in select are aggregate functions.
Here are some examples:

select max(a) from t
select max(a), min(a), count(a) from t

But the following queries should throw exceptions

select a, max(a) from t
select a, max(a), min(a), count(a) from t
select *, max(a), min(a), count(a) from t

They could work with a group by

select a, max(a) from t group by b
select a, max(a), min(a), count(a) from t group by b
select *, max(a), min(a), count(a) from t group by b

Similar, aggregate functions in PPL could work in eval command, because an eval expression equals to add a projection to existing project list.
Here are some examples:

source=t | eval m = max(a) | fields m
source=t | eval m = max(a), n = count(a) | fields m, n
source=t | eval m = max(a) | n = count(a) | fields m, n

But the following PPL queries should throw exceptions

source=t | eval m = max(a) -- equals to all fields plus m
source=t | eval m = max(a), n = count(a) | fields a, m, n
source=t | eval m = max(a) | n = count(a) | fields a, m, n

Issues Resolved

#755

Check List

- [ ] Updated documentation (ppl-spark-integration/README.md)

  • Implemented unit tests
  • Implemented tests for combination with other commands
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@@ -177,27 +267,30 @@ class PPLLogicalPlanEvalTranslatorTestSuite
comparePlans(expectedPlan, logPlan, checkAnalysis = false)
}

// Todo fields-excluded command not supported
ignore("test eval expressions with fields-excluded command") {
test("test eval expressions with fields-excluded command") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not related. Just enable the test since fields exclude list has been supported.

comparePlans(expectedPlan, logPlan, checkAnalysis = false)
}

// Todo fields-included command not supported
ignore("test eval expressions with fields-included command") {
test("test eval expressions with fields-included command") {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@YANG-DB
Copy link
Member

YANG-DB commented Oct 9, 2024

@LantaoJin this is great !
can you please add the following paragraph to the stats command document ?

Description
Aggregate functions could work in select clause even there is no group by as long as all items in select are aggregate functions.
Here are some examples:

select max(a) from t
select max(a), min(a), count(a) from t
But the following queries should throw exceptions

select a, max(a) from t
select a, max(a), min(a), count(a) from t
select *, max(a), min(a), count(a) from t
They could work with a group by

select a, max(a) from t group by b
select a, max(a), min(a), count(a) from t group by b
select *, max(a), min(a), count(a) from t group by b
Similar, aggregate functions in PPL could work in eval command, because an eval expression equals to add a projection to existing project list.
Here are some examples:

source=t | eval m = max(a) | fields m
source=t | eval m = max(a), n = count(a) | fields m, n
source=t | eval m = max(a) | n = count(a) | fields m, n
But the following PPL queries should throw exceptions

source=t | eval m = max(a) -- equals to all fields plus m
source=t | eval m = max(a), n = count(a) | fields a, m, n
source=t | eval m = max(a) | n = count(a) | fields a, m, n

@YANG-DB YANG-DB added 0.5 Lang:PPL Pipe Processing Language support 0.6 and removed 0.5 labels Oct 9, 2024
@LantaoJin
Copy link
Member Author

I am going to close this due to #755 (comment)

@LantaoJin LantaoJin closed this Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.6 Lang:PPL Pipe Processing Language support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants