-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] build cudf jars with NVTX enabled #1721
Comments
In the past I saw what I thought was a very significant performance difference when running some queries. I tried to reproduce this by running TPCH q1 at scale factor 100 against 4 different cudf-0.19 jars each compiled slightly differently. I compiled one like I normally do for development (NVTX enabled and RMM_LOGGING set to the default) I ran the test using the benchmark tool included in the integration tests jar. The only thing that I changed between the runs was the cudf jar.
To me the only difference in the times that I can see appears to be within the noise between runs. I will try with a few more queries to see if I can find a difference at all. @rongou when you first asked to disabled NVTX in CUDF what was the query that you were running that showed a big enough difference that you wanted to disable it? |
I ran all of the TPCH queries with the jar from the nightly build and for my build. All of the hot runs were within what appeared to be noise. The largest difference was for Q21 in which the jar with RMM logging and NVTX enabled happened to be 3.7 seconds (1.7%) faster. My guess is that it was a hiccup. Unless I can get a reproducible use case I am going to assume that NVTX enabled vs disabled has no impact on real world run times. But I will leave this alone for a while to see if someone else can come up with a query where it does have a measurable impact. |
I'm not sure it had a big effect on query performance to begin with, I think I was just trying to eliminate variables that might have affected PTDS. The option cudf had for turning off NVTX didn't work, so my fix was mainly about correctness. I believe the NVTX api itself has also gone through some major changes, so it's quite possible currently there is no perceivable performance difference with it on or off. |
@rongou thanks, That is what we heard from the developer of the API. They expected the overhead to be a function call followed by a single global variable lookup. That is what we are seeing in practice, so to make life simpler for customers I think we want to turn on NVTX by default in our builds. |
+1 on this |
Investigation was done under NVIDIA/spark-rapids#1721 and it showed no significant performance difference with NVTX on. It would make it a lot easier if this was on by default because it allows customers and developers to get trace with the same jar without having to go off and build a new CUDF version. So this PR turns it on by default and adds in reading from environment variable if we need to change in the future from build scripts. Authors: - Thomas Graves (@tgravescs) Approvers: - Jason Lowe (@jlowe) - Robert (Bobby) Evans (@revans2) URL: #7761
@tgravescs can this be closed? |
yes its turned on in rapids build |
Is your feature request related to a problem? Please describe.
Right now if you want to get an nvtx trace you have to go build cudf yourself. This is very time consuming if you are just using one of the base versions. It would be really nice if we can either have another set of jars with it enabled or if we can just enable it in our regular builds.
I know it was shut off because there was performance hit, do we need to revisit that to make sure actually a problem.
The text was updated successfully, but these errors were encountered: