Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use proper alignment when copying data to a variable. #20975

Merged
merged 1 commit into from
Mar 14, 2017

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Mar 10, 2017

align=1 is very expensive on hardware without unaligned ld/st (eg. GPUs).
ref #20593 (comment)

I'm not really familiar with these codepaths, so please review. A quick check revealed that lhs.typ == rhs.typ, so just picking the RHS's alignment seemed OK. And as per the LLVM docs, llvm.memset treats an alignment of 0 or 1 the same (unaligned), in case layout->alignment would ever be 0.

Using align=1 is very expensive on hardware without unaligned ld/st (eg. GPUs).
@maleadt maleadt added bugfix This change fixes an existing bug compiler:codegen Generation of LLVM IR and native code labels Mar 10, 2017
@maleadt maleadt requested a review from vtjnash March 10, 2017 14:09
@maleadt
Copy link
Member Author

maleadt commented Mar 14, 2017

Any comments? If not, I'd like to merge this some time soon, as it drastically improves performance on some GPU workloads.

@maleadt maleadt added this to the 0.6.0 milestone Mar 14, 2017
@vtjnash vtjnash requested a review from yuyichao March 14, 2017 20:39
@vtjnash vtjnash requested review from yuyichao and removed request for yuyichao March 14, 2017 20:41
@StefanKarpinski StefanKarpinski merged commit 1820efa into master Mar 14, 2017
@StefanKarpinski StefanKarpinski deleted the tb/align_asgn branch March 14, 2017 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug compiler:codegen Generation of LLVM IR and native code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants