Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Support thread local storage for reduction in struct-fors #1941

Merged
merged 16 commits into from
Oct 12, 2020

Conversation

yuanming-hu
Copy link
Member

@yuanming-hu yuanming-hu commented Oct 12, 2020

Related issue = #1407 closes #576

Benchmarks

3D 256^3 MGPCG reduction

CUDA

  • without TLS: 27.985
  • with TLS: 3.467 (8.1x faster)

CPU

  • without TLS: 992.493
  • with TLS: 18.771 (53x faster, the whole MGPCG is 4x faster because of this)

1D 1024 * 1024 * 128 linear reduction

CUDA

  • without TLS: 3.339
  • with TLS: 2.125 (1.5x faster)

CPU

  • without TLS: 2271.581
  • with TLS: 103.231 (22x faster)

[Click here for the format server]


@archibate
Copy link
Collaborator

So we used to only support TLS for reduction in range-fors?

@yuanming-hu yuanming-hu requested a review from k-ye October 12, 2020 04:33
@yuanming-hu yuanming-hu changed the title [Perf] Support thread local storage for reduction in struct-fors [Perf] Support thread local storage for reduction in struct-fors Oct 12, 2020
@codecov
Copy link

codecov bot commented Oct 12, 2020

Codecov Report

Merging #1941 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1941   +/-   ##
=======================================
  Coverage   43.72%   43.72%           
=======================================
  Files          45       45           
  Lines        6207     6207           
  Branches     1103     1103           
=======================================
  Hits         2714     2714           
  Misses       3322     3322           
  Partials      171      171           
Impacted Files Coverage Δ
python/taichi/testing.py 75.00% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db9ceeb...e6ee8d9. Read the comment docs.

Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for documenting the control flow, much cleaner!

@yuanming-hu
Copy link
Member Author

So we used to only support TLS for reduction in range-fors?

@archibate Yes!

@yuanming-hu yuanming-hu merged commit 2ccf0f5 into taichi-dev:master Oct 12, 2020
@yuanming-hu yuanming-hu deleted the struct-for-tls branch October 12, 2020 14:45
@yuanming-hu yuanming-hu mentioned this pull request Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TLS] Thread local storage for optimized reduction
3 participants