Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ingest_raw_data performance issue in Nested JSON reader due to RVO #12070

Merged

Conversation

karthikeyann
Copy link
Contributor

Description

Issue is that json::experimental::ingest_raw_data took double the time of json::ingest_raw_data for same data.

After replacing tertiary operator with if else, runtime for 500 MB file is same as json::ingest_raw_data
I suspect, RVO (copy elision) is skipped while using tertiary operator.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@karthikeyann karthikeyann added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue 4 - Needs cuIO Reviewer Performance Performance related issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 4, 2022
@karthikeyann karthikeyann added this to the Nested JSON reader milestone Nov 4, 2022
@karthikeyann karthikeyann self-assigned this Nov 4, 2022
@karthikeyann karthikeyann requested a review from a team as a code owner November 4, 2022 14:33
@karthikeyann karthikeyann changed the title Fix ingest_raw_data performance issue due to RVO Fix ingest_raw_data performance issue in Nested JSON reader due to RVO Nov 4, 2022
Comment on lines +60 to +64
if (compression == compression_type::NONE) {
return buffer;
} else {
return decompress(compression, buffer);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I also didn't know that ternary expression could prevent copy elision
Ref: https://stackoverflow.com/questions/22078029/why-does-the-ternary-operator-prevent-return-value-optimization

If possible, can you make a simple example in goldbolt (class printing to stdout when constructed/copied) to demonstrate that, please?

Copy link
Contributor Author

@karthikeyann karthikeyann Nov 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://godbolt.org/z/1vzPqqzs4
Here is a sample code.
tertiary operator calls vector(vector const&) for buffer return
but if() return buffer; calls vector(vector&&).

Reason for this could be - for tertiary operator, one is lvalue (buffer), and the other is rvalue (return value of another function).
So, for lvalue, it makes call to copy constructor.
For Rvalue, it called move constructor.

@codecov
Copy link

codecov bot commented Nov 4, 2022

Codecov Report

Base: 87.47% // Head: 88.12% // Increases project coverage by +0.64% 🎉

Coverage data is based on head (ea207bf) compared to base (f817d96).
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.12   #12070      +/-   ##
================================================
+ Coverage         87.47%   88.12%   +0.64%     
================================================
  Files               133      135       +2     
  Lines             21826    22011     +185     
================================================
+ Hits              19093    19397     +304     
+ Misses             2733     2614     -119     
Impacted Files Coverage Δ
python/cudf/cudf/io/text.py 91.66% <0.00%> (-8.34%) ⬇️
python/cudf/cudf/core/_base_index.py 81.28% <0.00%> (-4.27%) ⬇️
python/cudf/cudf/io/json.py 92.06% <0.00%> (-2.68%) ⬇️
python/cudf/cudf/utils/utils.py 89.91% <0.00%> (-0.69%) ⬇️
python/dask_cudf/dask_cudf/core.py 73.72% <0.00%> (-0.41%) ⬇️
python/cudf/cudf/io/parquet.py 90.45% <0.00%> (-0.39%) ⬇️
python/dask_cudf/dask_cudf/backends.py 84.90% <0.00%> (-0.37%) ⬇️
python/cudf/cudf/core/column/datetime.py 89.62% <0.00%> (-0.09%) ⬇️
python/cudf/cudf/io/orc.py 92.94% <0.00%> (-0.09%) ⬇️
python/cudf/cudf/core/dataframe.py 93.67% <0.00%> (-0.06%) ⬇️
... and 36 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@karthikeyann
Copy link
Contributor Author

rerun tests

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a brilliant catch.

@karthikeyann
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 262631b into rapidsai:branch-22.12 Nov 7, 2022
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuIO Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants