Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] build cudf jars with NVTX enabled #1721

Closed
tgravescs opened this issue Feb 12, 2021 · 7 comments
Closed

[FEA] build cudf jars with NVTX enabled #1721

tgravescs opened this issue Feb 12, 2021 · 7 comments
Assignees
Labels
feature request New feature or request

Comments

@tgravescs
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Right now if you want to get an nvtx trace you have to go build cudf yourself. This is very time consuming if you are just using one of the base versions. It would be really nice if we can either have another set of jars with it enabled or if we can just enable it in our regular builds.

I know it was shut off because there was performance hit, do we need to revisit that to make sure actually a problem.

@tgravescs tgravescs added feature request New feature or request ? - Needs Triage Need team to review and classify labels Feb 12, 2021
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Feb 16, 2021
@revans2
Copy link
Collaborator

revans2 commented Feb 18, 2021

In the past I saw what I thought was a very significant performance difference when running some queries. I tried to reproduce this by running TPCH q1 at scale factor 100 against 4 different cudf-0.19 jars each compiled slightly differently.

I compiled one like I normally do for development (NVTX enabled and RMM_LOGGING set to the default)
I complied a separate one with NVTX disabled
Another with NVTX enabled and RMM_LOGGING turned off
and finally the last one I downloaded from the nightly build repo that has both RMM logging disabled and NVTX disabled.

I ran the test using the benchmark tool included in the integration tests jar. The only thing that I changed between the runs was the cudf jar.

$ egrep 'queryTimes|class' *.json
tpch-q1-parquet-1613596781989.json:  "queryTimes" : [ 146709, 144134 ],
tpch-q1-parquet-1613597155688.json:  "queryTimes" : [ 145989, 144112 ],
tpch-q1-parquet-1613602719033.json:  "queryTimes" : [ 146123, 143993 ],
tpch-q1-parquet-1613603478326.json:  "queryTimes" : [ 146179, 144077 ],

To me the only difference in the times that I can see appears to be within the noise between runs. I will try with a few more queries to see if I can find a difference at all. @rongou when you first asked to disabled NVTX in CUDF what was the query that you were running that showed a big enough difference that you wanted to disable it?

@revans2
Copy link
Collaborator

revans2 commented Feb 18, 2021

I ran all of the TPCH queries with the jar from the nightly build and for my build. All of the hot runs were within what appeared to be noise. The largest difference was for Q21 in which the jar with RMM logging and NVTX enabled happened to be 3.7 seconds (1.7%) faster. My guess is that it was a hiccup.

Unless I can get a reproducible use case I am going to assume that NVTX enabled vs disabled has no impact on real world run times. But I will leave this alone for a while to see if someone else can come up with a query where it does have a measurable impact.

@rongou
Copy link
Collaborator

rongou commented Feb 18, 2021

I'm not sure it had a big effect on query performance to begin with, I think I was just trying to eliminate variables that might have affected PTDS. The option cudf had for turning off NVTX didn't work, so my fix was mainly about correctness. I believe the NVTX api itself has also gone through some major changes, so it's quite possible currently there is no perceivable performance difference with it on or off.

@revans2
Copy link
Collaborator

revans2 commented Feb 18, 2021

@rongou thanks, That is what we heard from the developer of the API. They expected the overhead to be a function call followed by a single global variable lookup. That is what we are seeing in practice, so to make life simpler for customers I think we want to turn on NVTX by default in our builds.

@abellina
Copy link
Collaborator

+1 on this

rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Mar 31, 2021
Investigation was done under NVIDIA/spark-rapids#1721 and it showed no significant performance difference with NVTX on. It would make it a lot easier if this was on by default because it allows customers and developers to get trace with the same jar without having to go off and build a new CUDF version.

So this PR turns it on by default and adds in reading from environment variable if we need to change in the future from build scripts.

Authors:
  - Thomas Graves (@tgravescs)

Approvers:
  - Jason Lowe (@jlowe)
  - Robert (Bobby) Evans (@revans2)

URL: #7761
@sameerz
Copy link
Collaborator

sameerz commented Apr 10, 2021

@tgravescs can this be closed?

@tgravescs
Copy link
Collaborator Author

yes its turned on in rapids build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants