Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] DISTRIBUTE BY is not supported(line 59:undefined, pos 0) when using hudi-0.11.1 & spark-3.2.1 #6156

Closed
jiezi2026 opened this issue Jul 21, 2022 · 2 comments
Assignees
Labels
feature-enquiry issue contains feature enquiries/requests or great improvement ideas priority:minor everything else; usability gaps; questions; feature reqs spark-sql

Comments

@jiezi2026
Copy link

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

When without conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' ,start a sparksql application by "/opt/apache/SPARK/SPARK-CURRENT/bin/spark-sql --num-executors 5 --queue=root.bi --conf spark.executor.cores=3 --conf spark.driver.memory=2G --conf spark.executor.memory=5G --conf spark.executor.memoryOverhead=2G"
-------------------[sparksql]---------------------------
select 1 distribute by rand()
-------------------[sparksql]---------------------------
The SQL execution results are as follows:
image

But when conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' is added,start another application by "/opt/apache/SPARK/SPARK-CURRENT/bin/spark-sql --num-executors 5 --queue=root.bi --conf spark.executor.cores=3 --conf spark.driver.memory=2G --conf spark.executor.memory=5G --conf spark.executor.memoryOverhead=2G --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension"
-------------------[sparksql]---------------------------
select 1 distribute by rand()
-------------------[sparksql]---------------------------
Error operating EXECUTE_STATEMENT: org.apache.spark.sql.catalyst.parser.ParseException: DISTRIBUTE BY is not supported(line 1:undefined, pos 9)
image

It makes it impossible for me to use distribute by on other non Hudi tables

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version :0.11.1

  • Spark version :3.2.1

  • Hive version :2.1.1-cdh6.3.2

  • Hadoop version :3.0.0-cdh6.3.2

  • Storage (HDFS/S3/GCS..) :HDFS

  • Running on Docker? (yes/no) :no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

@xushiyan xushiyan added priority:minor everything else; usability gaps; questions; feature reqs feature-enquiry issue contains feature enquiries/requests or great improvement ideas spark-sql labels Jul 21, 2022
@KnightChess
Copy link
Contributor

#6033 will fix it

@nsivabalan
Copy link
Contributor

closing it out since the PR is landed. thanks @KnightChess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-enquiry issue contains feature enquiries/requests or great improvement ideas priority:minor everything else; usability gaps; questions; feature reqs spark-sql
Projects
None yet
Development

No branches or pull requests

5 participants