Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Qualification Tool for Gluten #7544

Open
srinivasst opened this issue Oct 15, 2024 · 4 comments · May be fixed by #8484
Open

[Core] Qualification Tool for Gluten #7544

srinivasst opened this issue Oct 15, 2024 · 4 comments · May be fixed by #8484
Labels
enhancement New feature or request

Comments

@srinivasst
Copy link
Contributor

Description

The purpose of this enhancement is to create a qualification tool that analyzes customer event files to determine which workloads are suitable for execution with Gluten. This is crucial when onboarding new customers, as not all workloads benefit from Gluten's native acceleration—especially workloads with RDD operations, unsupported SQL operators or UDF workloads.

Proposed Solution:

Develop a Java program to analyze the event files, given a Hadoop file path as input. The program will generate two reports:

  • Application Report:
    • Percentage of RDD usage
    • Percentage of Unsupported SQL operations
    • Percentage of supported SQL operations
    • Cumulative task time for each application
    • Recommendation to use Gluten Acceleration (Recommended if Percentage of supported SQL operations >= 70%)
  • Unsupported Operator Report:
    • Unsupported SQL operators
    • Impact on cumulative CPU time

Requirements:

  • Compatibility with Hadoop file paths that point to:
    • Single event files
    • Event directories with rolling event files
    • Deeply nested directories containing event files
    • Compressed event files
@srinivasst srinivasst added the enhancement New feature or request label Oct 15, 2024
@srinivasst
Copy link
Contributor Author

WIP - will upload approach and PR in a few days.

@xumingming
Copy link
Contributor

Interesting idea, Looking forward to it.

@FelixYBW FelixYBW changed the title Qualification Tool for Gluten [Core] Qualification Tool for Gluten Oct 16, 2024
@LoseYSelf
Copy link
Contributor

(spark-rapids-tools) is NVIDIA RAPIDS tool, has similar functions

@surnaik
Copy link
Contributor

surnaik commented Nov 19, 2024

@github-actions github-actions bot linked a pull request Jan 9, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants