Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] parquet-dereference-pushdown not working on hudi 0.10.1 and presto 0.275 #7976

Open
guanlisheng opened this issue Feb 16, 2023 · 1 comment
Labels
priority:major degraded perf; unable to move forward; potential bugs query-engine trino, presto, athena, impala, etc

Comments

@guanlisheng
Copy link

guanlisheng commented Feb 16, 2023

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

with hive.enable-parquet-dereference-pushdown property and session parquet_batch_reader_verification_enabled, the query on Hudi table's sub-field is not working and always returns errors.

A clear and concise description of the problem.

in the CLI console it says

Query 20230216_023851_00265_zje6y failed: Error opening Hive split s3a://xxx/yyy/2023/02/10/20/3ba89a4b-d5fa-4fca-b5b2-59b4924c34b0-0_1-18551-4732440_20230210200557831.parquet (offset=0, length=202461131): null

more details from presto web stack trace:

	at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:391)
	at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:196)
	at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:452)
	at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:187)
	at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:63)
	at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:80)
	at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:248)
	at com.facebook.presto.operator.Driver.processInternal(Driver.java:426)
	at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:309)
	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:730)
	at com.facebook.presto.operator.Driver.processFor(Driver.java:302)
	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1079)
	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:166)
	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599)
	at com.facebook.presto.$gen.Presto_0_275_f3f1035____20230215_041712_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: undefined

To Reproduce

Steps to reproduce the behavior:

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.10.1

  • Spark version : 2.4.8

  • Hive version : 2.3.9

  • Hadoop version :2.10.1

  • Storage (HDFS/S3/GCS..) : S3

  • Running on Docker? (yes/no) : no

Additional context

Add any other context about the problem here.

  • Presto version : 0.275

Stacktrace

Add the stacktrace of the error.

@yihua
Copy link
Contributor

yihua commented Feb 16, 2023

Hey @guanlisheng thanks for adding the details. We need to repro this. @todd5167 @codope do you have any idea why this can fail?

@yihua yihua added priority:major degraded perf; unable to move forward; potential bugs query-engine trino, presto, athena, impala, etc labels Feb 16, 2023
@github-project-automation github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Feb 16, 2023
@codope codope moved this from ⏳ Awaiting Triage to 🚧 Needs Repro in Hudi Issue Support May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:major degraded perf; unable to move forward; potential bugs query-engine trino, presto, athena, impala, etc
Projects
Status: 🚧 Needs Repro
Development

No branches or pull requests

2 participants