Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge release code into development #11

Closed
wants to merge 536 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
536 commits
Select commit Hold shift + click to select a range
f4c2504
[AMDGPU] Split unaligned 4 DWORD DS operations
rampitec Apr 11, 2022
742aa54
Revert "[AMDGPU] Omit unnecessary waitcnt before barriers"
kerbowa Apr 19, 2022
c37d1e5
[AMDGPU] Graceful abort for waterfalls in SIOptimizeVGPRLiveRange
perlfu Apr 12, 2022
ac38934
AMDGPU: Clear kill flags when optimizing vcmp save exec sequence
kzhuravl Jun 14, 2022
e4a22bd
[openmp] - Package reorg cherry-pick from staging.
estewart08 Jun 10, 2022
1136c78
merge amd-stg-open into amd-mainline-open
Jun 23, 2022
2df35ea
Cherry-pick changes required for flang which were not in bulk promo.
estewart08 Jun 28, 2022
b367f04
merge amd-stg-open into promotion/amd-mainline-open/2022.07.02
Jul 12, 2022
d7dd76e
[HIP] Add HIP runtime library arguments for linker
yxsamliu Apr 27, 2022
bcbabbd
[AMDGPU] Fix bitcast v4i64/v16i16
piotrAMD Jul 11, 2022
412df76
[openmp] - Remove redundant code that came in with the merge for July 2.
estewart08 Jul 14, 2022
5bde048
Revert "[UnifyLoopExits] Reduce number of guard blocks"
bcahoon Jul 14, 2022
bf656f2
Revert "[StructurizeCFG] Improve basic block ordering"
bcahoon Jul 14, 2022
15ea427
[HeterogeneousDWARF] Avoid pathological slowdown in PruningFunctionCl…
slinder1 Jul 12, 2022
ef15358
[AMDGPU] Add WMMA clang builtins
piotrAMD Jul 1, 2022
361da17
[AMDGPU] Make v16i16/v16f16 legal
piotrAMD Jun 30, 2022
d7554f1
[AMDGPU] Update WMMA intrinsics with explicit f16 types
piotrAMD Jul 1, 2022
d63452c
[AMDGPU] Fix bitcast v4i64/v16i16
piotrAMD Jul 11, 2022
2044b9e
SWDEV-328206 Copyrights, Palamida scans
Lynd98 Jul 15, 2022
6c814bf
[AMDGPU] Additional liveness tests for si-optimize-exec-masking-pre-ra
perlfu Jul 6, 2022
eb3883c
[AMDGPU] Improve liveness copying in si-optimize-exec-masking-pre-ra
perlfu Jul 17, 2022
2c9fc53
[AMDGPU] Set amdgpu-memory-bound if a basic block has dense global me…
abinavpp Jul 13, 2022
781e41b
[clang][openmp] - Remove ABRT_Inactive from OpenMPDeviceActions.
estewart08 Jul 8, 2022
e492a6a
merge amd-stg-open into amd-mainline-open
Jul 28, 2022
3511298
[AMDGPU] Disable FillMFMAShadowMutation by default
kerbowa Jul 6, 2022
12dc4c6
[AMDGPU] Update the mechanism used to check for cycles and add eges i…
jrbyrnes Jul 13, 2022
dcf9440
Revert "Revert "[Clang][Attribute] Introduce maybe_undef attribute fo…
skc7 Jul 19, 2022
6f510c5
[SANITIZER_AMDGPU] Add a workaround to solve the 2MB alignment requir…
bing-ma Jul 29, 2022
f40c695
[AMDGPU] gfx11 Generate VOPD Instructions
Sisyph Jun 23, 2022
69e945e
[AMDGPU] gfx11 CodeGen for new DPP instructions
Sisyph Jun 27, 2022
b00a6ad
[AMDGPU] Add patterns for GFX11 v_minmax and v_maxmin instructions
jayfoad Jun 23, 2022
4144fd5
[AMDGPU] GFX11 trivial NFC tweaks
jayfoad Jul 5, 2022
327865b
[AMDGPU] NFC. Add a test of the error message for assembling global_a…
Sisyph Jul 5, 2022
303afb1
AMDGPU: Take care of "tied" operand when removeOperand
changpeng Jul 29, 2022
4b1a0c2
Fix lit test
searlmc1 Aug 3, 2022
df7b6c4
AMDGPU: Refine user-sgpr-init16-bug
arsenm Jul 21, 2022
b5945d5
[AMDGPU] user-sgpr-init16-bug does not apply to gfx1103
jayfoad Jul 22, 2022
28b7f95
Enable up to 64 arguments for outlined regions in OpenMP device code …
carlobertolli Jul 11, 2022
f939122
[AMDGPU] Fix DGEMM hazard for GFX90a
vangthao95 Jul 27, 2022
875e6e9
[openmp] - Ensure libclang-cpp.so is linked when needed.
estewart08 Jul 19, 2022
13bdc72
Revert "[AMDGPU] Only count global-to-global as indirect accesses"
Jul 26, 2022
40908f1
SWDEV-345870 - Correct include path for new directory layout
raramakr Jul 15, 2022
3c651cf
[openmp] - Removed race conditions in new and old runtimes.
estewart08 Aug 5, 2022
2345773
[OffloadArch] - Update generated header.
estewart08 Aug 4, 2022
45a11fa
Transform illegal intrinsics to V_ILLEGAL
Aug 6, 2022
fdd4d18
[openmp][amdgpu] - Change location of device environment symbol.
estewart08 Aug 12, 2022
6aa7ac6
[AMDGPU] Add amdgcn_sched_group_barrier builtin
kerbowa Jun 13, 2022
de6b501
[AMDGPU] Add isMeta flag to SCHED_GROUP_BARRIER
kerbowa Jul 28, 2022
04b2771
[AMDGPU] Remove unused function
kerbowa Jul 30, 2022
a7b6d0b
[AMDGPU] Pre-commit tests for D130797
kerbowa Aug 5, 2022
5fee210
merge amd-stg-open into amd-maineline-open
Aug 16, 2022
a64c147
[AMDGPU] Start refactoring GCNSchedStrategy
kerbowa Jul 14, 2022
440d1c3
[Reassociate] Enable FP reassociation via 'reassoc' and 'nsz'
ranapratap55 Aug 18, 2022
6534678
[SROA] Try harder to find a vector promotion viable type when rewriting
vangthao95 Jun 10, 2022
8300726
[AMDGPU] Implement pipeline solver for non-trivial pipelines
jrbyrnes Jul 20, 2022
5a714a5
[AMDGPU] Add iglp_opt builtin and MFMA GEMM Opt strategy
kerbowa Aug 17, 2022
2c8ca82
[AMDGPU] Add builtin s_sendmsg_rtn
yxsamliu Aug 18, 2022
cb4f1d1
[AMDGPU] Unify unreachable intrinsics
yxsamliu Aug 4, 2022
8b691f9
[AMDGPU][Clang] Skip adding noundef attribute to AMDGPU HIP device fu…
skc7 Aug 11, 2022
003d1da
[AMDGPU] Aggressively schedule to reduce RP in occupancy limited regions
kerbowa Jul 20, 2022
ca29a1d
Implement requires unified_address.
carlobertolli Jul 29, 2022
12a2b31
Enable -fopenmp-assume-no-thread-state at -Ofast. This behavior
dhruvachak Aug 2, 2022
77abf53
Use the name Xteam in optimized reduction codegen (NFC).
dhruvachak Aug 1, 2022
ff1a2c6
[OpenMP] Fix the static library build for DeviceRTL in libomptarget
animeshk-amd Aug 3, 2022
a4715a8
Private scope must end after corresponding emit of OpenMP statement.
dhruvachak Aug 4, 2022
56fc6d4
fixed double free in target unregister
ThorBl Aug 5, 2022
a9a2016
Handle continue in no-loop code generation.
dhruvachak Aug 9, 2022
419e317
Initial implementation of pragma requires dynamic_allocators.
carlobertolli Aug 5, 2022
54640d9
[OPENMP] add xteam helper functions for min and max. There are still…
gregrodgers Aug 10, 2022
cbecb69
[SWDEV-348894][CRAYA-219] ROCm OpenMP compiler produces binary asm ou…
nicebert Aug 9, 2022
f66bb40
[OPENMP] add helper function xteam_sum_i. Simplified max calculation …
gregrodgers Aug 11, 2022
87b5728
Added support for non-unit stride in NoLoop and Xteam CodeGen
dhruvachak Aug 11, 2022
c59a5fc
[OPENMP] Eliminate DeviceRTL printing in release mode. Eliminate asse…
gregrodgers Aug 16, 2022
29923a2
Support for collapsed loops in NoLoop codegen.
dhruvachak Aug 18, 2022
ae6bd88
Port old device RTL implementation of omp_get_thread_limit on device …
carlobertolli Aug 18, 2022
e46c943
fix for host-side memory leak #397
ThorBl Aug 5, 2022
c436232
[OPENMP] Merge this after some cleanup. This update adds unsigned, l…
gregrodgers Aug 19, 2022
84fcda9
Implement memory-order clauses for flush directive on amdgpu target.
carlobertolli Aug 17, 2022
f342835
Prevent plugin to use pinned pool when user has already pinned their …
carlobertolli Aug 19, 2022
ab887d7
[OPENMP] add new DeviceRTL api for __kmpc_parallel_spmd as alternativ…
gregrodgers Aug 16, 2022
5d027ee
merging initial rpc count implementation with latest. concurrent kern…
akadutta Jul 26, 2022
ef92696
[OpenMP] Add option to assert no nested OpenMP parallelism on the GPU
jhuber6 Aug 17, 2022
f23645b
Emit call to parallel_spmd in SPMD mode and no-nested-parallelism.
dhruvachak Aug 26, 2022
13efd46
[SWDEV-354679] -fomp-target-fast flag implies -fopenmp-target-ignore-…
nicebert Sep 1, 2022
d5cf4a1
Expose malloc-free API for amdgcn
carlobertolli Aug 26, 2022
72af3ea
[llvm][Attributor] - Comment out call to getAssumedSimplified.
estewart08 Aug 26, 2022
328408f
[OPENMP] change the interface to xteam helper functions to add po…
gregrodgers Aug 28, 2022
9a811ee
[OpenMP] Add ompx_ version of fast/unsafe/safe FP atomics hint values.
carlobertolli Aug 29, 2022
f977f51
[openmp][amdgpu] Move global DeviceInfo behind call syntax prior to u…
Jul 28, 2022
456e684
[openmp][amdgpu] Tear down amdgpu plugin accurately
JonChesterfield Jul 28, 2022
3fa4523
Move small pool manager to RTLDeviceInfo class. This fixes the double
dhruvachak Aug 24, 2022
c94d8f7
[openmp] Introduce optional plugin init/deinit functions
JonChesterfield Jul 28, 2022
cada775
Merge branch 'promo-54' into promotion/amd-mainline-open/2022.07.28
ronlieb Sep 6, 2022
4ffa346
merge amd-stg-open into amd-mainline-open
Sep 13, 2022
d0349e2
Add clang command line reference documentation for -fopenmp-target-fast
nicebert Sep 8, 2022
0021170
-fopenmp-target-fast implies -O3 if no -O* option is specified
nicebert Sep 9, 2022
4fa421a
[OpenMP][[libomptarget] Omit debug symbols from DeviceRTL in release …
carlobertolli Sep 13, 2022
68fd5f4
Do not generate call to parallel_spmd if using old runtime.
dhruvachak Sep 13, 2022
10f4fb1
This is a revert of the following commit. The changes were made
dhruvachak Sep 13, 2022
a518205
[Libomptarget][amdgpu plugin] Add DP tracing to early exits in getLau…
ronlieb Sep 17, 2022
c77f20f
Move allocas coming from __kmpc_alloc_shared to entry block.
doru1004 Sep 23, 2022
eee98eb
[DAGCombine] Do not fold SRA/SRL of MUL into MULH when MUL's LSB are
jmmartinez Sep 16, 2022
2440370
[openmp] - Correct omptarget.devicertl.a install location.
estewart08 Sep 27, 2022
0d48c8f
[OpenMP][AMDGPU] Enable OpenMP device runtime for gfx110[0123]
dpalermo Sep 17, 2022
b7ee298
[OpenMP][AMDGPU] Add OpenMP devices for gfx110[0123]
dpalermo Sep 17, 2022
dd5205e
[AMDGPU] W/a hazard if 64 bit shift amount is a highest allocated VGPR
rampitec Aug 31, 2022
cc9a5a2
[AMDGPU] Fix liveness verifier error in hazard recognizer
rampitec Sep 7, 2022
f48c2c5
[OpenMP][AMDGPU] Add Device IDs for gfx1100
dpalermo Oct 7, 2022
b59da25
RegAllocGreedy: Fix nondeterminism in tryLastChanceRecoloring
arsenm Jul 27, 2022
1d6c91c
RegAllocGreedy: Try local instruction splitting with subranges
arsenm Jul 24, 2022
4c21610
[openmp] - Cherry pick changes for CUDA_ARCH and libclang-cpp.so.
estewart08 Oct 10, 2022
5fd4850
[AMDGPU] Don't shrink VOP3 instructions pre-RA on GFX10+
jayfoad Sep 13, 2022
30a91aa
[RegisterCoalescer] Fix crash on early clobbered subreg operands.
dfukalov Sep 6, 2022
abaea88
[AMDGPU][MC][GFX11] Correct v_dot2_f16_f16 and v_dot2_bf16_bf16
dpreobra Aug 3, 2022
bd82e2a
[AMDGPU][MC][GFX11][NFC] Consolidate VOP tests by encoding
dpreobra Aug 11, 2022
a415536
[AMDGPU][MC][GFX11] Correct e64_dpp variants of v_movreld and v_movrelsd
dpreobra Oct 5, 2022
599e54b
[AMDGPU][GFX11] Use VGPR_32_Lo128 for VOP1,2,C
Sisyph Jul 13, 2022
16779b4
[AMDGPU][MC][GFX11][NFC] Update asm tests for VOPC instructions promo…
dpreobra Aug 30, 2022
11d03a0
[AMDGPU] Fix True16 patterns for cmp on GFX11
Sisyph Oct 10, 2022
1f7608c
[AMDGPU] Report minimum scratch size in code object v5 and later by d…
abinavpp Sep 15, 2022
73e7575
[AMDGPU] Make the uses_dynamic_stack field in the kernel descriptor a…
abinavpp Sep 27, 2022
76215ff
[docs] Fix warning in AMDGPUUsage.rst after 3d9f011a9c624b3128bc6b5e7…
abinavpp Oct 11, 2022
09733d5
[HIP] Fix unbundling archive
yxsamliu Sep 10, 2022
43ea9be
[OpenMP][AMDGPU] Update Device IDs for gfx1100
dpalermo Oct 26, 2022
68c0e98
[AMDGPU][Backend] Fix user-after-free in AMDGPUReleaseVGPRs::isLastVG…
jmmartinez Oct 19, 2022
0e5348d
[NFC] Remove unused set construction from DILocation::getMergedLocation
jmmartinez Sep 21, 2022
a659241
[DebugInfo] getMergedLocation: Maintain the line number if they match
jmmartinez Oct 25, 2022
72be781
AMDGPU: Fix hazard with v_accvgpr_write_b32 and inline asm VGPR defs
arsenm Oct 11, 2022
1ea6039
merge amd-stg-open into amd-mainline-open
Nov 24, 2022
3e1c22a
Fix dup defs in bulk merge - openmp
ronlieb Nov 25, 2022
4827525
[AMDGPU][InsertWaits] No wait for WAW for global/scratch_load
ruiling Nov 21, 2022
1deabec
Revert default behavior of 'loop' directive from worksharing back to
ddpagan Nov 4, 2022
0039007
[OpenMP][AMDGPU] Update Device IDs for gfx1100
dpalermo Nov 12, 2022
095a2a1
[openmp] change NUM_QUEUES_PER_DEVICE default to 1
ronlieb Nov 13, 2022
99d5d14
[OpenMP][OMPT] Implements OMPT functions
jplehr Nov 9, 2022
a198ba5
[OpenMP][Classic Flang]Add Fortran modulo function
DominikAdamski Nov 17, 2022
e3337db
[OpenMP][AMDGPU] Add PCI IDs for gfx103[56] APUs
dpalermo Nov 17, 2022
d0dfd7d
Revert "[AMDGPU] Always select s_cselect_b32 for uniform 'select' SDN…
ronlieb Nov 25, 2022
769bdd1
Revert "[AMDGPU] Omit unnecessary waitcnt before barriers"
Dec 1, 2022
2f3148a
[openmp] - Update clang-build-select-link to skip certain functions.
estewart08 Nov 30, 2022
97b8243
Add additional hint path for finding LLVMOffloadArch.
estewart08 Nov 29, 2022
0e9a8d7
[SANITIZER_AMDGPU] Add a temporary workaround to address the 'invalid…
bing-ma Dec 3, 2022
8236457
Carlo's patch for 5.5 hsa issue openmp
ronlieb Dec 9, 2022
ac5a2ce
[HeterogeneousDWARF] Drop expressions with complex register referrers
slinder1 Dec 8, 2022
0c4dd46
merge promotion/amd-mainline-open/2022.11.11 into into amd-mainline-open
Dec 12, 2022
44b0a45
[Sema] Fix crash when evaluating nested call with value-dependent arg
kzhuravl Dec 13, 2022
559b860
Use LLVM APIs for dlsym in OMPT target support.
dhruvachak Dec 13, 2022
6e984a9
[openmp] - Fix missing shared library when using prep tool.
estewart08 Dec 14, 2022
ca32e6d
[OpenMP][OMPD] Moves all dlsym calls to LLVM support lib
jplehr Dec 13, 2022
f5ca2c7
[OpenMP][OMPD] Removes remaining calls to dlerror
jplehr Dec 14, 2022
2ee32a7
[HIP] Fix lld failure when devie object is empty
yxsamliu Nov 17, 2022
1c8e3d5
[InferAddressSpaces] [AMDGPU] Add inference for flat_atomic intrinsics
jrbyrnes Jul 28, 2022
879bfce
Revert "[InferAddressSpaces] [AMDGPU] Add inference for flat_atomic i…
searlmc1 Dec 17, 2022
4df4322
Fix use of abs function in C device regions.
doru1004 Oct 20, 2022
b22d681
Support for multiple blocksizes and threads clauses in Xteam reduction.
dhruvachak Nov 18, 2022
28dd632
[OpenMP] Free data transfer HSA signal in releaseResources
jplehr Nov 25, 2022
bcd350c
[OPENMP] support more teams than threads in cross team reductions
gregrodgers Dec 1, 2022
c9cf2ed
[OPENMP] new intra-team reductions and new configs 1x64, 2x64, 2x32, …
gregrodgers Dec 1, 2022
4a974fa
Added 64 and 128 as supported blocksizes for Xteam reduction.
dhruvachak Dec 15, 2022
07e3f3d
Implemented BigJumpLoop SPMD kernel with a new exec mode of 5.
dhruvachak Dec 15, 2022
d989681
SWDEV-328212 don't busy wait
amd-isparry Dec 10, 2022
7701320
[AMDGPU][SIFrameLowering] Mark VGPR used for AGPR spills as reserved
jrbyrnes Dec 8, 2022
211bcf3
[AMDGPU] Update MFMASmallGemmOpt with better performing strategy
kerbowa Dec 2, 2022
4af3ed2
Allow num_teams clause in Xteam reduction.
dhruvachak Dec 2, 2022
2c7712b
[openmp] build-slect patch to fix ASO for clang-325070 and clang-ifaces
ronlieb Dec 19, 2022
c081558
Require the absence of nested parallelism for CodeGen to generate
dhruvachak Dec 20, 2022
1d652cb
Added lit tests for NoLoop and Xteam reduction.
dhruvachak Dec 17, 2022
e757909
[classic-flang] Move reduction opt enable to fopenmp-target-fast
ronlieb Dec 27, 2022
34b8d58
[libompd] SWDEV-375131 undefined get_dlsym_for_name... plugin
ronlieb Dec 28, 2022
ded404e
[libompd] CMake include issue for python module DynamicLibrary.h
ronlieb Dec 29, 2022
3e0bf3d
Re-land Juan's change:
kzhuravl Nov 1, 2022
0632209
Enable roundeven.
Dec 20, 2022
3cb9ae1
[clang][cuda/hip] Allow `__noinline__` lambdas
Pierre-vh Nov 4, 2022
884579c
[HeterogeneousDWARF] Teach TypeFinder to traverse DIExpr
slinder1 Oct 28, 2022
26824d4
AMDGPU: Set scratch_en if there is dynamic stack but no fixed stack
arsenm Jan 4, 2023
0258fa5
[CodeGen][AMDGPU] EXTRACT_VECTOR_ELT: input vector element type can d…
jmmartinez Jan 6, 2023
7d4123e
Implement remaining combined 'loop' directives.
ddpagan Jan 4, 2023
c6f1813
[AMDGPU] More selectively attach implicit operands to agpr spills
jrbyrnes Jan 6, 2023
ceedbdf
[PACKAGE_VERSION] Remove improper expansion
ronlieb Jan 17, 2023
5260143
[HeterogeneousDWARF] Workaround to support -fgpu-rdc
slinder1 Jan 10, 2023
ce81f2a
[InstCombine] Revert D125845
huangjd Nov 29, 2022
dad80d2
[CodeGen] Prevent overlapping subregs in getCoveringSubRegIndexes
Pierre-vh Jan 12, 2023
ca8878c
Fix host call to nohost function with host variant.
doru1004 Dec 15, 2022
cb424d0
Fix declare target implementation to support enter.
doru1004 Nov 16, 2022
38e9a3f
[Clang][OpenMP] Add support for default to/from map types on target e…
doru1004 Nov 17, 2022
2a31764
Add Parse/Sema for iterator for map clause.
doru1004 Jan 6, 2023
9eab468
Fix tests for commit 658ed9547cdd6657895339a6c390c31aa77a5698.
doru1004 Dec 19, 2022
f28fecb
[openmp] Workaround for HSA in issue 60119
JonChesterfield Jan 21, 2023
d3b6207
Remove constructor attribute from ompt_init, instead call it directly
dhruvachak Jan 25, 2023
9a1a2f1
[a+a] remove -rv-max-reg-size from a+a flags
ronlieb Jan 29, 2023
fa3869a
[InstCombine] Add tests for alloca removal with phi nodes (NFC)
gandhi56 Jan 11, 2023
fd3a14f
[ADT] Allow structured bindings on PointerIntPair
d0k Dec 4, 2022
a52a937
[InstCombine] Add Visited set to isOnlyCopiedFromConstantMemory()
nikic Jan 11, 2023
1336852
[InstCombine] Limit use walk in copied from constant fold
nikic Jan 11, 2023
554ae69
[InstCombine] Handle PHI nodes in isOnlyCopiedFromConstantMemory()
gandhi56 Jan 11, 2023
664fbc0
[InstCombine] Handle PHI nodes in PtrReplacer
gandhi56 Jan 17, 2023
50caee3
[MachineBasicBlock] Explicit FT branching param
gandhi56 Jan 17, 2023
4f29fc8
[LTO] Don't generate invalid modules if "LTOPostLink" MD already exists
Pierre-vh Jan 30, 2023
a88e0af
[ELF] Emit Verbose Asm when using --lto-emit-asm
Pierre-vh Jan 9, 2023
1129120
[openmp][libompd] - Ensure the python module links in libclang-cpp.so
estewart08 Jan 24, 2023
de845bc
[AMDGPU][COV5] Enable default code object version 5
ronlieb Sep 28, 2022
985afd9
[clang][driver] - Fix bug where the suffix logic is incorrect.
estewart08 Feb 9, 2023
b53a128
Use hsa_memory_copy for initialization to prevent user locking of pro…
carlobertolli Feb 7, 2023
417c29f
[AMDGPU] Add missing physical register check in SIFoldOperands::tryFo…
Jan 24, 2023
3572336
[InstCombine] Increase limit for max copied from constant fold
bcahoon Feb 14, 2023
da8abba
[HeterogeneousDWARF] Force the CFA to be in private_wave format when …
jmmartinez Feb 2, 2023
642cb3f
HeterogeneousDwarf/AMDGPU: Do not scale frame location by wf size
kzhuravl Feb 16, 2023
58c90dd
[HeterogeneousDWARF] Refer to the underlying pointer instead of the c…
jmmartinez Nov 15, 2022
9eab7c9
switched order of the omp_is_initial_device methods in omp.h.var. Now…
ThorBl Feb 20, 2023
22e2e7e
[openmp] - Update complex header to be in line with trunk.
estewart08 Feb 20, 2023
f646126
[AMDGPU] Remove function with incompatible features
Pierre-vh Feb 10, 2023
e23a68d
[AMDGPU] Remove dot1 and dot6 features from clang for gfx11
rampitec Jan 24, 2023
9601673
AMDGPU: Use module flag to get code object version at IR level
changpeng Feb 3, 2023
75a34d5
AMDGPU: Use module flag to get code object version at IR level folow-up
changpeng Feb 10, 2023
116aa7b
[AAPointerInfo] check for Unknown offsets in callee
ssahasra Oct 31, 2022
d7265f7
[AAPointerInfo] refactor how offsets and Access objects are tracked
ssahasra Nov 15, 2022
abf3a86
[NFC][AAPointerInfo] rename OffsetAndSize to RangeTy
ssahasra Nov 1, 2022
db22267
[Attributor] Introduce assumption accesses in AAPointerInfo
jdoerfert Jul 11, 2022
b4b9d64
[AAPointerInfo] rearrange code in preparation for further changes
ssahasra Dec 9, 2022
dc8587e
[Attributor] Keep complex select and PHI instructions in AAPotentialV…
jdoerfert Oct 7, 2022
5f7d9a9
[AAPointerInfo] track multiple constant offsets for each use
ssahasra Dec 12, 2022
c9ea6f2
[AMDGPU] Remove the assertion for MUBUF instruction with voffset
cdevadas Nov 14, 2022
fbd5829
[AMDGPU] Callee must always spill writelane VGPRs
cdevadas Aug 22, 2022
fab7de4
[AMDGPU] Add WWM reserved VGPRs to WWMSpills
cdevadas Aug 22, 2022
93e7145
[AMDGPU] Correctly set IsKill flag for VGPR spills in the prolog
cdevadas Aug 22, 2022
1b8288b
[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
cdevadas Aug 22, 2022
a5b28ad
[AMDGPU][SIFrameLowering] Unify PEI SGPR spill saves and restores
cdevadas Aug 23, 2022
dd78290
[AMDGPU] Preserve only the inactive lanes of scratch vgprs
cdevadas Sep 25, 2022
6fe7093
[AMDGPU][SIFrameLowering] Use the right frame register in CSR spills
cdevadas Sep 25, 2022
4943be0
[CodeGen] Use delegate to notify targets when virtual registers are c…
cdevadas Sep 28, 2022
ca4dc3b
[CodeGen] Use cloneVirtualRegister in LiveIntervals and LiveRangeEdit
cdevadas Sep 29, 2022
17518f5
[CodeGen] Additional Register argument to storeRegToStackSlot/loadReg…
cdevadas Nov 24, 2022
9efa61a
AMDGPU: Remove BufferPseudoSourceValue
nhaehnle Nov 25, 2022
ea76739
AMDGPU: Remove ImagePSV and move images to addrspace 7
nhaehnle Nov 29, 2022
f405596
AMDGPU: Directly pass Function to mayUseAGPRs
arsenm Nov 2, 2022
22f4eea
WebAssembly: Remove MachineFunction reference from MFI
arsenm Nov 2, 2022
3cd8b78
AArch64: Stop storing MachineFunction in MachineFunctionInfo
arsenm Dec 16, 2022
c970882
CodeGen: Don't lazily construct MachineFunctionInfo
arsenm Jun 18, 2020
d2aae09
[CodeGen] MRI call back in TargetMachine
cdevadas Dec 23, 2022
9b73747
[AMDGPU] Add pre-commit test for optimized KILL insertion.
cdevadas Feb 1, 2023
1b845ea
[VirtRegMap] Further optimize emitting KILL for copy
cdevadas Dec 28, 2022
c355f8c
[MachineInstr] Use isCopy helper function (NFC).
cdevadas Jan 5, 2023
aea1cdf
[MachineInstr] Introduce TII buildCopy helper functions (NFC).
cdevadas Jan 8, 2023
74aedd9
[MachineInstr] Introduce generic predicated copy opcode
cdevadas Aug 27, 2022
5c33854
[AMDGPU] Use buildCopy and isCopy helper functions (NFC).
cdevadas Jan 10, 2023
6e17b50
[AMDGPU] Enable predicated copy right from instruction selection
cdevadas Oct 11, 2022
92af42c
[AMDGPU] Implement whole wave register spill
cdevadas Dec 28, 2022
51d3182
[AMDGPU] Enable whole wave register copy
cdevadas Dec 28, 2022
249046b
[AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs
cdevadas Dec 23, 2022
5fe166b
[AMDGPU][SILowerSGPRSpills] Insert individual kill instructions
cdevadas Mar 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
11 changes: 11 additions & 0 deletions .project
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<?xml version="1.0" encoding="UTF-8"?>
<projectDescription>
<name>llvm-project</name>
<comment></comment>
<projects>
</projects>
<buildSpec>
</buildSpec>
<natures>
</natures>
</projectDescription>
3 changes: 3 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -595,6 +595,9 @@ C++2b Feature Support
CUDA/HIP Language Changes in Clang
----------------------------------

- Allow the use of ``__noinline__`` as a keyword (instead of ``__attribute__((noinline))``)
in lambda declarations.

Objective-C Language Changes in Clang
-------------------------------------

Expand Down
17 changes: 14 additions & 3 deletions clang/include/clang/AST/OpenMPClause.h
Original file line number Diff line number Diff line change
Expand Up @@ -5723,7 +5723,7 @@ class OMPMapClause final : public OMPMappableExprListClause<OMPMapClause>,
size_t numTrailingObjects(OverloadToken<Expr *>) const {
// There are varlist_size() of expressions, and varlist_size() of
// user-defined mappers.
return 2 * varlist_size();
return 2 * varlist_size() + 1;
}
size_t numTrailingObjects(OverloadToken<ValueDecl *>) const {
return getUniqueDeclarationsNum();
Expand All @@ -5737,7 +5737,7 @@ class OMPMapClause final : public OMPMappableExprListClause<OMPMapClause>,
OpenMPMapModifierKind MapTypeModifiers[NumberOfOMPMapClauseModifiers] = {
OMPC_MAP_MODIFIER_unknown, OMPC_MAP_MODIFIER_unknown,
OMPC_MAP_MODIFIER_unknown, OMPC_MAP_MODIFIER_unknown,
OMPC_MAP_MODIFIER_unknown};
OMPC_MAP_MODIFIER_unknown, OMPC_MAP_MODIFIER_unknown};

/// Location of map-type-modifiers for the 'map' clause.
SourceLocation MapTypeModifiersLoc[NumberOfOMPMapClauseModifiers];
Expand Down Expand Up @@ -5838,6 +5838,11 @@ class OMPMapClause final : public OMPMappableExprListClause<OMPMapClause>,
/// Set colon location.
void setColonLoc(SourceLocation Loc) { ColonLoc = Loc; }

/// Set iterator modifier.
void setIteratorModifier(Expr *IteratorModifier) {
getTrailingObjects<Expr *>()[2 * varlist_size()] = IteratorModifier;
}

public:
/// Creates clause with a list of variables \a VL.
///
Expand All @@ -5850,6 +5855,7 @@ class OMPMapClause final : public OMPMappableExprListClause<OMPMapClause>,
/// \param ComponentLists Component lists used in the clause.
/// \param UDMapperRefs References to user-defined mappers associated with
/// expressions used in the clause.
/// \param IteratorModifier Iterator modifier.
/// \param MapModifiers Map-type-modifiers.
/// \param MapModifiersLoc Location of map-type-modifiers.
/// \param UDMQualifierLoc C++ nested name specifier for the associated
Expand All @@ -5862,7 +5868,7 @@ class OMPMapClause final : public OMPMappableExprListClause<OMPMapClause>,
Create(const ASTContext &C, const OMPVarListLocTy &Locs,
ArrayRef<Expr *> Vars, ArrayRef<ValueDecl *> Declarations,
MappableExprComponentListsRef ComponentLists,
ArrayRef<Expr *> UDMapperRefs,
ArrayRef<Expr *> UDMapperRefs, Expr *IteratorModifier,
ArrayRef<OpenMPMapModifierKind> MapModifiers,
ArrayRef<SourceLocation> MapModifiersLoc,
NestedNameSpecifierLoc UDMQualifierLoc, DeclarationNameInfo MapperId,
Expand All @@ -5881,6 +5887,11 @@ class OMPMapClause final : public OMPMappableExprListClause<OMPMapClause>,
static OMPMapClause *CreateEmpty(const ASTContext &C,
const OMPMappableExprListSizeTy &Sizes);

/// Fetches Expr * of iterator modifier.
Expr *getIteratorModifier() {
return getTrailingObjects<Expr *>()[2 * varlist_size()];
}

/// Fetches mapping kind for the clause.
OpenMPMapClauseKind getMapType() const LLVM_READONLY { return MapType; }

Expand Down
4 changes: 2 additions & 2 deletions clang/include/clang/Basic/Attr.td
Original file line number Diff line number Diff line change
Expand Up @@ -3749,8 +3749,8 @@ def OMPDeclareTargetDecl : InheritableAttr {
let Documentation = [OMPDeclareTargetDocs];
let Args = [
EnumArgument<"MapType", "MapTypeTy",
[ "to", "link" ],
[ "MT_To", "MT_Link" ]>,
[ "to", "enter", "link" ],
[ "MT_To", "MT_Enter", "MT_Link" ]>,
EnumArgument<"DevType", "DevTypeTy",
[ "host", "nohost", "any" ],
[ "DT_Host", "DT_NoHost", "DT_Any" ]>,
Expand Down
14 changes: 12 additions & 2 deletions clang/include/clang/Basic/DiagnosticParseKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -1359,7 +1359,7 @@ def err_omp_unknown_map_type : Error<
"incorrect map type, expected one of 'to', 'from', 'tofrom', 'alloc', 'release', or 'delete'">;
def err_omp_unknown_map_type_modifier : Error<
"incorrect map type modifier, expected one of: 'always', 'close', 'mapper'"
"%select{|, 'present'}0%select{|, 'ompx_hold'}1">;
"%select{|, 'present'|, 'present', 'iterator'}0%select{|, 'ompx_hold'}1">;
def err_omp_map_type_missing : Error<
"missing map type">;
def err_omp_map_type_modifier_missing : Error<
Expand All @@ -1383,12 +1383,22 @@ def note_omp_assumption_clause_continue_here
: Note<"the ignored tokens spans until here">;
def err_omp_declare_target_unexpected_clause: Error<
"unexpected '%0' clause, only %select{'device_type'|'to' or 'link'|'to', 'link' or 'device_type'|'device_type', 'indirect'|'to', 'link', 'device_type' or 'indirect'}1 clauses expected">;
def err_omp_declare_target_unexpected_clause_52: Error<
"unexpected '%0' clause, only %select{'device_type'|'enter' or 'link'|'enter', 'link' or 'device_type'|'device_type', 'indirect'|'enter', 'link', 'device_type' or 'indirect'}1 clauses expected">;
def err_omp_begin_declare_target_unexpected_implicit_to_clause: Error<
"unexpected '(', only 'to', 'link' or 'device_type' clauses expected for 'begin declare target' directive">;
def err_omp_declare_target_unexpected_clause_after_implicit_to: Error<
def err_omp_declare_target_wrong_clause_after_implicit_to: Error<
"unexpected clause after an implicit 'to' clause">;
def err_omp_declare_target_wrong_clause_after_implicit_enter: Error<
"unexpected clause after an implicit 'enter' clause">;
def err_omp_declare_target_missing_to_or_link_clause: Error<
"expected at least one %select{'to' or 'link'|'to', 'link' or 'indirect'}0 clause">;
def err_omp_declare_target_missing_enter_or_link_clause: Error<
"expected at least one %select{'enter' or 'link'|'enter', 'link' or 'indirect'}0 clause">;
def err_omp_declare_target_unexpected_to_clause: Error<
"unexpected 'to' clause, use 'enter' instead">;
def err_omp_declare_target_unexpected_enter_clause: Error<
"unexpected 'enter' clause, use 'to' instead">;
def err_omp_declare_target_multiple : Error<
"%0 appears multiple times in clauses on the same declare target directive">;
def err_omp_declare_target_indirect_device_type: Error<
Expand Down
2 changes: 2 additions & 0 deletions clang/include/clang/Basic/DiagnosticSemaKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -10814,6 +10814,8 @@ def err_omp_depend_sink_source_with_modifier : Error<
"depend modifier cannot be used with 'sink' or 'source' depend type">;
def err_omp_depend_modifier_not_iterator : Error<
"expected iterator specification as depend modifier">;
def err_omp_map_modifier_not_iterator : Error<
"expected iterator specification as map modifier">;
def err_omp_linear_ordered : Error<
"'linear' clause cannot be specified along with 'ordered' clause with a parameter">;
def err_omp_unexpected_schedule_modifier : Error<
Expand Down
2 changes: 2 additions & 0 deletions clang/include/clang/Basic/LangOptions.def
Original file line number Diff line number Diff line change
Expand Up @@ -253,8 +253,10 @@ LANGOPT(OpenMPCUDANumSMs , 32, 0, "Number of SMs for CUDA devices.")
LANGOPT(OpenMPCUDABlocksPerSM , 32, 0, "Number of blocks per SM for CUDA devices.")
LANGOPT(OpenMPCUDAReductionBufNum , 32, 1024, "Number of the reduction records in the intermediate reduction buffer used for the teams reductions.")
LANGOPT(OpenMPGPUThreadsPerTeam, 32, 256, "Number of threads per team for GPUs.")
LANGOPT(OpenMPTargetXteamReductionBlockSize, 32, 1024, "Number of threads in a block used by cross-team reduction.")
LANGOPT(OpenMPTargetDebug , 32, 0, "Enable debugging in the OpenMP offloading device RTL")
LANGOPT(OpenMPTargetIgnoreEnvVars , 1, 0, "Generate code assuming that device related environment variables can be ignored.")
LANGOPT(OpenMPTargetBigJumpLoop , 1, 0, "Use big jump loop code generation technique.")
LANGOPT(OpenMPOptimisticCollapse , 1, 0, "Use at most 32 bits to represent the collapsed loop nest counter.")
LANGOPT(OpenMPThreadSubscription , 1, 0, "Assume work-shared loops do not have more iterations than participating threads.")
LANGOPT(OpenMPTeamSubscription , 1, 0, "Assume distributed loops do not have more iterations than participating teams.")
Expand Down
1 change: 1 addition & 0 deletions clang/include/clang/Basic/OpenMPKinds.def
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ OPENMP_MAP_KIND(release)
OPENMP_MAP_MODIFIER_KIND(always)
OPENMP_MAP_MODIFIER_KIND(close)
OPENMP_MAP_MODIFIER_KIND(mapper)
OPENMP_MAP_MODIFIER_KIND(iterator)
OPENMP_MAP_MODIFIER_KIND(present)
// This is an OpenMP extension for the sake of OpenACC support.
OPENMP_MAP_MODIFIER_KIND(ompx_hold)
Expand Down
2 changes: 1 addition & 1 deletion clang/include/clang/Basic/OpenMPKinds.h
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ enum OpenMPMapModifierKind {
OMPC_MAP_MODIFIER_last
};

/// Number of allowed map-type-modifiers.
/// Number of allowed map-type-modifiers.
static constexpr unsigned NumberOfOMPMapClauseModifiers =
OMPC_MAP_MODIFIER_last - OMPC_MAP_MODIFIER_unknown - 1;

Expand Down
14 changes: 12 additions & 2 deletions clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -2616,6 +2616,8 @@ def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], "fopenmp-cuda-teams
Flags<[CC1Option, NoArgumentUnused, HelpHidden]>;
def fopenmp_gpu_threads_per_team_EQ : Joined<["-"], "fopenmp-gpu-threads-per-team=">, Group<f_Group>,
Flags<[CC1Option, NoArgumentUnused, HelpHidden]>;
def fopenmp_target_xteam_reduction_blocksize_EQ : Joined<["-"], "fopenmp-target-xteam-reduction-blocksize=">, Group<f_Group>,
Flags<[CC1Option, NoArgumentUnused, HelpHidden]>;
def fopenmp_target_debug : Flag<["-"], "fopenmp-target-debug">, Group<f_Group>, Flags<[CC1Option, NoArgumentUnused]>,
HelpText<"Enable debugging in the OpenMP offloading device RTL">;
def fno_openmp_target_debug : Flag<["-"], "fno-openmp-target-debug">, Group<f_Group>, Flags<[NoArgumentUnused]>;
Expand All @@ -2630,6 +2632,14 @@ def fno_openmp_target_ignore_env_vars : Flag<["-"], "fno-openmp-target-ignore-en
Flags<[CC1Option, NoArgumentUnused, HelpHidden]>,
HelpText<"Assert that device related environment variables cannot be ignored while generating code">,
MarshallingInfoFlag<LangOpts<"OpenMPTargetIgnoreEnvVars">>;
def fopenmp_target_big_jump_loop : Flag<["-"], "fopenmp-target-big-jump-loop">, Group<f_Group>,
Flags<[CC1Option, NoArgumentUnused, HelpHidden]>,
HelpText<"Use the big-jump-loop code generation technique if possible">,
MarshallingInfoFlag<LangOpts<"OpenMPTargetBigJumpLoop">>;
def fno_openmp_target_big_jump_loop : Flag<["-"], "fno-openmp-target-big-jump-loop">, Group<f_Group>,
Flags<[CC1Option, NoArgumentUnused, HelpHidden]>,
HelpText<"Do not use the big-jump-loop code generation technique">,
MarshallingInfoFlag<LangOpts<"OpenMPTargetBigJumpLoop">>;
def fopenmp_assume_teams_oversubscription : Flag<["-"], "fopenmp-assume-teams-oversubscription">,
Group<f_Group>, Flags<[CC1Option, NoArgumentUnused, HelpHidden]>;
def fopenmp_assume_threads_oversubscription : Flag<["-"], "fopenmp-assume-threads-oversubscription">,
Expand Down Expand Up @@ -3724,12 +3734,12 @@ defm amdgpu_ieee : BoolOption<"m", "amdgpu-ieee",
NegFlag<SetFalse, [CC1Option]>>, Group<m_Group>;

def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group<m_Group>,
HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">,
HelpText<"Specify code object ABI version. Defaults to 5. (AMDGPU only)">,
Flags<[CC1Option]>,
Values<"none,2,3,4,5">,
NormalizedValuesScope<"TargetOptions">,
NormalizedValues<["COV_None", "COV_2", "COV_3", "COV_4", "COV_5"]>,
MarshallingInfoEnum<TargetOpts<"CodeObjectVersion">, "COV_4">;
MarshallingInfoEnum<TargetOpts<"CodeObjectVersion">, "COV_5">;

defm code_object_v3_legacy : SimpleMFlag<"code-object-v3",
"Legacy option to specify code object ABI V3",
Expand Down
4 changes: 3 additions & 1 deletion clang/include/clang/Sema/Sema.h
Original file line number Diff line number Diff line change
Expand Up @@ -11079,6 +11079,7 @@ class Sema final {
QualType MapperType,
SourceLocation StartLoc,
DeclarationName VN);
void ActOnOpenMPIteratorVarDecl(VarDecl *VD);
bool isOpenMPDeclareMapperVarDeclAllowed(const VarDecl *VD) const;
const ValueDecl *getOpenMPDeclareMapperVarName() const;

Expand Down Expand Up @@ -11790,6 +11791,7 @@ class Sema final {
/// Data used for processing a list of variables in OpenMP clauses.
struct OpenMPVarListDataTy final {
Expr *DepModOrTailExpr = nullptr;
Expr *IteratorExpr = nullptr;
SourceLocation ColonLoc;
SourceLocation RLoc;
CXXScopeSpec ReductionOrMapperIdScopeSpec;
Expand Down Expand Up @@ -11916,7 +11918,7 @@ class Sema final {
SourceLocation EndLoc);
/// Called on well-formed 'map' clause.
OMPClause *ActOnOpenMPMapClause(
ArrayRef<OpenMPMapModifierKind> MapTypeModifiers,
Expr *IteratorModifier, ArrayRef<OpenMPMapModifierKind> MapTypeModifiers,
ArrayRef<SourceLocation> MapTypeModifiersLoc,
CXXScopeSpec &MapperIdScopeSpec, DeclarationNameInfo &MapperId,
OpenMPMapClauseKind MapType, bool IsMapTypeImplicit,
Expand Down
2 changes: 1 addition & 1 deletion clang/lib/AST/AttrImpl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ void OMPDeclareTargetDeclAttr::printPrettyPragma(
// Use fake syntax because it is for testing and debugging purpose only.
if (getDevType() != DT_Any)
OS << " device_type(" << ConvertDevTypeTyToStr(getDevType()) << ")";
if (getMapType() != MT_To)
if (getMapType() != MT_To && getMapType() != MT_Enter)
OS << ' ' << ConvertMapTypeTyToStr(getMapType());
if (Expr *E = getIndirectExpr()) {
OS << " indirect(";
Expand Down
10 changes: 7 additions & 3 deletions clang/lib/AST/ExprConstant.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16053,9 +16053,13 @@ bool Expr::EvaluateWithSubstitution(APValue &Value, ASTContext &Ctx,
if ((*I)->isValueDependent() ||
!EvaluateCallArg(PVD, *I, Call, Info) ||
Info.EvalStatus.HasSideEffects) {
// If evaluation fails, throw away the argument entirely.
if (APValue *Slot = Info.getParamSlot(Call, PVD))
*Slot = APValue();
// If evaluation fails, throw away the argument entirely unless I is
// value-dependent. In those cases, the condition above will short-circuit
// before calling `EvaluateCallArg` and no param slot is created.
if (!(*I)->isValueDependent()) {
if (APValue *Slot = Info.getParamSlot(Call, PVD))
*Slot = APValue();
}
}

// Ignore any side-effects from a failed evaluation. This is safe because
Expand Down
30 changes: 22 additions & 8 deletions clang/lib/AST/OpenMPClause.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1127,7 +1127,7 @@ OMPMapClause *OMPMapClause::Create(
const ASTContext &C, const OMPVarListLocTy &Locs, ArrayRef<Expr *> Vars,
ArrayRef<ValueDecl *> Declarations,
MappableExprComponentListsRef ComponentLists, ArrayRef<Expr *> UDMapperRefs,
ArrayRef<OpenMPMapModifierKind> MapModifiers,
Expr *IteratorModifier, ArrayRef<OpenMPMapModifierKind> MapModifiers,
ArrayRef<SourceLocation> MapModifiersLoc,
NestedNameSpecifierLoc UDMQualifierLoc, DeclarationNameInfo MapperId,
OpenMPMapClauseKind Type, bool TypeIsImplicit, SourceLocation TypeLoc) {
Expand All @@ -1150,7 +1150,7 @@ OMPMapClause *OMPMapClause::Create(
void *Mem = C.Allocate(
totalSizeToAlloc<Expr *, ValueDecl *, unsigned,
OMPClauseMappableExprCommon::MappableComponent>(
2 * Sizes.NumVars, Sizes.NumUniqueDeclarations,
2 * Sizes.NumVars + 1, Sizes.NumUniqueDeclarations,
Sizes.NumUniqueDeclarations + Sizes.NumComponentLists,
Sizes.NumComponents));
OMPMapClause *Clause = new (Mem)
Expand All @@ -1159,6 +1159,7 @@ OMPMapClause *OMPMapClause::Create(

Clause->setVarRefs(Vars);
Clause->setUDMapperRefs(UDMapperRefs);
Clause->setIteratorModifier(IteratorModifier);
Clause->setClauseInfo(Declarations, ComponentLists);
Clause->setMapType(Type);
Clause->setMapLoc(TypeLoc);
Expand All @@ -1171,10 +1172,12 @@ OMPMapClause::CreateEmpty(const ASTContext &C,
void *Mem = C.Allocate(
totalSizeToAlloc<Expr *, ValueDecl *, unsigned,
OMPClauseMappableExprCommon::MappableComponent>(
2 * Sizes.NumVars, Sizes.NumUniqueDeclarations,
2 * Sizes.NumVars + 1, Sizes.NumUniqueDeclarations,
Sizes.NumUniqueDeclarations + Sizes.NumComponentLists,
Sizes.NumComponents));
return new (Mem) OMPMapClause(Sizes);
OMPMapClause *Clause = new (Mem) OMPMapClause(Sizes);
Clause->setIteratorModifier(nullptr);
return Clause;
}

OMPToClause *OMPToClause::Create(
Expand Down Expand Up @@ -2216,16 +2219,27 @@ static void PrintMapper(raw_ostream &OS, T *Node,
OS << Node->getMapperIdInfo() << ')';
}

template <typename T>
static void PrintIterator(raw_ostream &OS, T *Node,
const PrintingPolicy &Policy) {
if (Expr *IteratorModifier = Node->getIteratorModifier())
IteratorModifier->printPretty(OS, nullptr, Policy);
}

void OMPClausePrinter::VisitOMPMapClause(OMPMapClause *Node) {
if (!Node->varlist_empty()) {
OS << "map(";
if (Node->getMapType() != OMPC_MAP_unknown) {
for (unsigned I = 0; I < NumberOfOMPMapClauseModifiers; ++I) {
if (Node->getMapTypeModifier(I) != OMPC_MAP_MODIFIER_unknown) {
OS << getOpenMPSimpleClauseTypeName(OMPC_map,
Node->getMapTypeModifier(I));
if (Node->getMapTypeModifier(I) == OMPC_MAP_MODIFIER_mapper)
PrintMapper(OS, Node, Policy);
if (Node->getMapTypeModifier(I) == OMPC_MAP_MODIFIER_iterator) {
PrintIterator(OS, Node, Policy);
} else {
OS << getOpenMPSimpleClauseTypeName(OMPC_map,
Node->getMapTypeModifier(I));
if (Node->getMapTypeModifier(I) == OMPC_MAP_MODIFIER_mapper)
PrintMapper(OS, Node, Policy);
}
OS << ',';
}
}
Expand Down
2 changes: 0 additions & 2 deletions clang/lib/Basic/Targets/AMDGPU.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -189,9 +189,7 @@ bool AMDGPUTargetInfo::initFeatureMap(
case GK_GFX1101:
case GK_GFX1100:
Features["ci-insts"] = true;
Features["dot1-insts"] = true;
Features["dot5-insts"] = true;
Features["dot6-insts"] = true;
Features["dot7-insts"] = true;
Features["dot8-insts"] = true;
Features["dl-insts"] = true;
Expand Down
1 change: 1 addition & 0 deletions clang/lib/Basic/Targets/AMDGPU.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#ifndef LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H
#define LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H

#include "clang/Basic/AddressSpaces.h"
#include "clang/Basic/TargetID.h"
#include "clang/Basic/TargetInfo.h"
#include "clang/Basic/TargetOptions.h"
Expand Down
10 changes: 7 additions & 3 deletions clang/lib/CodeGen/CGCall.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2335,8 +2335,13 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
getLangOpts().Sanitize.has(SanitizerKind::Memory) ||
getLangOpts().Sanitize.has(SanitizerKind::Return);

// Enable noundef attribute based on codegen options and
// skip adding the attribute to HIP device functions.
bool EnableNoundefAttrs = CodeGenOpts.EnableNoundefAttrs &&
!(getLangOpts().HIP && getLangOpts().CUDAIsDevice);

// Determine if the return type could be partially undef
if (CodeGenOpts.EnableNoundefAttrs && HasStrictReturn) {
if (EnableNoundefAttrs && HasStrictReturn) {
if (!RetTy->isVoidType() && RetAI.getKind() != ABIArgInfo::Indirect &&
DetermineNoUndef(RetTy, getTypes(), DL, RetAI))
RetAttrs.addAttribute(llvm::Attribute::NoUndef);
Expand Down Expand Up @@ -2470,8 +2475,7 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
}

// Decide whether the argument we're handling could be partially undef
if (CodeGenOpts.EnableNoundefAttrs &&
DetermineNoUndef(ParamType, getTypes(), DL, AI)) {
if (EnableNoundefAttrs && DetermineNoUndef(ParamType, getTypes(), DL, AI)) {
Attrs.addAttribute(llvm::Attribute::NoUndef);
}

Expand Down
Loading