Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of TPC-H q19 #572

Closed
andygrove opened this issue Jun 14, 2024 · 2 comments
Closed

Improve performance of TPC-H q19 #572

andygrove opened this issue Jun 14, 2024 · 2 comments
Labels
enhancement New feature or request performance

Comments

@andygrove
Copy link
Member

What is the problem the feature request solves?

Comet is currently slower than Spark for query 19.

Some initial observations:

  • The sort merge join cannot run natively due to Support sort merge join with a join condition #398
  • The Parquet scan of lineitem seems to take ~10% longer than Spark and 60%+ of the time is spent in native decoding, so perhaps we should add criterion benchmarks for decoding for all types in lineitem and look for optimization opportunities there. I tested both before and after the recent changes to this code and saw no difference.
  • Comet avoids a very expensive C2R on 600 million rows from lineitem because it applies a filter before any C2R, so it is suprising that we are still slower
  • With Comet, there is a really slow C2R on the part table where it takes 18 seconds for 48k rows. Spark performs the C2R on 20 million rows and then filters down to 48k and that whole process only takes 3.2 seconds.
  • Spark coalesces down to 9 partitions and the HashAggregate takes 5.7 seconds and produces 9 rows, but we disable coalesce partitions with Comet and the HashAggregate there takes 11.2 seconds and produces 200 rows. Fixing Enable Comet shuffle with AQE coalesce partitions #387 would help with this

Describe the potential solution

No response

Additional context

No response

@parthchandra
Copy link
Contributor

  • The Parquet scan of lineitem seems to take ~10% longer than Spark and 60%+ of the time is spent in native decoding, so perhaps we should add criterion benchmarks for decoding for all types in lineitem and look for optimization opportunities there. I tested both before and after the recent changes to this code and saw no difference.

There is a chance that this is not in native decoding but in CometVector.getDecimal depending on if useDecimal128 is enabled or not. This part has a lot of data copying going on.

      byte[] bytes = getBinaryDecimal(i);
      BigInteger bigInteger = new BigInteger(bytes);
      BigDecimal javaDecimal = new BigDecimal(bigInteger, scale);
      try {
        return Decimal.apply(javaDecimal, precision, scale);

@andygrove andygrove removed this from the 0.2.0 milestone Aug 16, 2024
@andygrove
Copy link
Member Author

Comet is now faster than Spark for this query, and there is no longer a C2R in the Comet plan, so closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

2 participants