Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with main #3

Merged
merged 1,486 commits into from
Aug 20, 2024
Merged

Conversation

BaLiKfromUA
Copy link
Owner

No description provided.

hazzlim and others added 30 commits August 9, 2024 13:25
Generate nuw GEPs for struct member accesses, as inbounds + non-negative
implies nuw.

Regression tests are updated using update scripts where possible, and by
find + replace where not.
…spose(shape_cast) (llvm#100731)" (llvm#102457)

This reverts commit 88accd9.

This change can be dropped in favor of just llvm#102017.
Modifying `auto` to `auto&` to avoid unnecessary copying
…78112)

https://cplusplus.github.io/CWG/issues/2627.html

It is no longer a narrowing conversion when converting a bit-field to a
type smaller than the field's declared type if the bit-field has a width
small enough to fit in the target type. This includes integral
promotions (`long long i : 8` promoted to `int` is no longer narrowing,
allowing `c.i <=> c.i`) and list-initialization (`int n{ c.i };`)

Also applies back to C++11 as this is a defect report.
…lvm#102573)

Disables `vector.matrix_multiply` for scalable vectors. As per the docs:

>  This is the counterpart of llvm.matrix.multiply in MLIR

I'm not aware of any use of matrix-multiply intrinsics in the context of
scalable vectors, hence disabling.
…2535)

Commit cee594c added support to clang for multiple expressions in
`num_teams` clause. Add follow-up changes to flang.
As all the necessary information is encoded using attributes
nowadays, this test doesn't actually depend on the triple
anymore.
Syntacore SCR5 is an entry-level Linux-capable 32/64-bit RISC-V
processor core.
Overview: https://syntacore.com/products/scr5

Scheduling model will be added in a subsequent PR.

Co-authored-by: Dmitrii Petrov <[email protected]>
Co-authored-by: Anton Afanasyev <[email protected]>
…lvm#96287)"

This reverts commit ccb2b01.

Causes buildbot failures, e.g. on ppc64le builders.
Follow up on 199d6f2 (LSV: document hang reported in llvm#37865) to fix the
build when omitting the AArch64 target. Add the missing lit.local.cfg.
We should handle allocator attributes not only on function
declarations, but also on the call-site. That way we can e.g.
also optimize cases where the allocator function is a virtual
function call.

This was already supported in some of the MemoryBuiltins helpers,
but not all of them. This adds support for allocsize, alloc-family
and allockind("free").
…perations. (llvm#102105)

The code-generator is currently not able to handle scalable vectors of
<vscale x 1 x eltty>. The usual "fix" for this until it is supported is
to mark the costs of loads/stores with an invalid cost, preventing the
vectorizer from vectorizing at those factors. But on rare occasions
loops do not contain load/stores, only reductions.

So whilst this is still unsupported return an invalid cost to avoid
selecting vscale x 1 VFs. The cost of a reduction is not currently used
by the vectorizer so this adds the cost to the add/mul/and/or/xor or
min/max that should feed the reduction. It includes reduction costs
too, for completeness. This change will be removed when code-generation
for these types is sufficiently reliable.

Fixes llvm#99760
If nobuiltin is set, directly return nullptr instead of using a
separate out parameter and having all callers check this.
This allows moving some tests relying on -stop-after=amdgpu-isel
to move to checking -stop-after=finalize-isel instead, which
will more reliably pass the verifier.
I forgot to update the version info in the SDKSettings file when I
updated it to the real version relevant to the test.
Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to
`llvm.func` operations.

- `spir_kernel`/`spir_func` calling conventions used for
kernels/functions.
- `workgroup` attributions encoded as additional `llvm.ptr<3>`
arguments.
- No attribute used to annotate kernels
- `reqd_work_group_size` attribute using to encode
`gpu.known_block_size`.
- `llvm.mlir.workgroup_attrib_size` used to encode workgroup attribution
sizes. This will be attached to the pointer argument workgroup
attributions lower to.

**Note**: A notable missing feature that will be addressed in a
follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace
MemRef arguments with bare pointers to the MemRef element types instead
of the current MemRef descriptor approach.

---------

Signed-off-by: Victor Perez <[email protected]>
This PR adds conversion patterns for MemRef to the `convert-to-spirv`
pass, introduced in llvm#95942. Conversions from MemRef memory space to
SPIR-V storage class were also included, and would run before the final
dialect conversion phase.

**Future Plans**
- Add tests for ops other than `memref.load` and `memref.store`

---------

Co-authored-by: Jakub Kuderski <[email protected]>
…#101407)

This patch adds the code generation support for multi-dim `num_teams`
clause when it is used with `target teams ompx_bare` construct.
…ack for `insertelement` (llvm#82130)

Prior to this patch, SelectionDAG generated aligned move onto stacks for
AVX registers when the function was marked as a no-realign-stack
function. This lead to misalignment between the stack and the
instruction generated. This patch fixes the issue. There was a similar
issue reported for `extractelement` which was fixed in
a6614ec

Co-authored-by: Manish Kausik H <[email protected]>
Make it possible to do things like the following, regardless of whether
the offload target is nvptx or amdgpu:

```
$ clang -O1 -g -fopenmp --offload-arch=native test.c                       \
    -Xoffload-linker -mllvm=-pass-remarks=inline                           \
    -Xoffload-linker -mllvm=-force-remove-attribute=g.internalized:noinline\
    -Xoffload-linker --lto-newpm-passes='forceattrs,default<O1>'           \
    -Xoffload-linker --lto-debug-pass-manager                              \
    -foffload-lto
```

To accomplish that:

- In clang-linker-wrapper, do not forward options via `-Wl` if they
might have literal commas. Use `-Xlinker` instead.
- In clang-nvlink-wrapper, accept `--lto-debug-pass-manager` and
`--lto-newpm-passes`.
- In clang-nvlink-wrapper, drop `-passes` because it's inconsistent with
the interface of `lld`, which is used instead of clang-nvlink-wrapper
when the target is amdgpu. Without this patch, `-passes` is passed to
`nvlink`, producing an error anyway.

---------

Co-authored-by: Joseph Huber <[email protected]>
Without this, the doc string is put in a single line. These
scripts have multi-line docstrings, so this makes their --help
output look much nicer.

Otherwise, no behavior change.
Inspired by llvm#99418 (which hopefully we can replace this code with at some point)
bjope and others added 28 commits August 12, 2024 13:28
With opaque pointers we can just get the pointer type for the
resolver function by using PointerType::get, making the
GlobalIFunc::getResolverFunctionType function obsolete.
The others are already inline here.
…g. (llvm#102650)

Simplifies checks for AGPRs and VGPRs and makes them more explicit and
less fragile.
Mention the names of unavailable registers in error messages to not make
the diagnostics for execz/vccz less rich than it was.

Clean up unnecessary name qualifications while there.

Part of <llvm#62629>.
llvm#102123)

…Type

This is needed to ensure we find a type if its definition is in a CU
that wasn't indexed. This can happen if the definition is in some
precompiled code (e.g. the c++ standard library) which was built with
different flags than the rest of the binary.
…ADS is not defined

When LLVM_ENABLE_THREADS is not defined, llvm::get_threadid returns 0 which
makes this test case fail.

This is a pretty niche setting, Linaro uses it to stop lld crashing our 32 bit
containers. So the test will get plenty of runs elsewhere.

In lldb's code it's not getting the current thread ID anyway, it's using
a value it got from ptrace. So even if that copy of lldb was built with
LLVM_ENABLE_THREADS off, it should still be able to debug threads.
On PlayStation, allow users to supply -static to the linker, via the
driver.

An initial step. Later changes will have the PS5 driver supply
additional options to the linker, if and when -static is passed.

SIE tracker: TOOLCHAIN-16704
We only need to see that 1 frame of the stack is in user code. No need
to carry on looking.

Doing so actually caused a test failure on Armv8 Ubuntu Jammy where
a libc function does not have a display name. I'm sure I'm going to
get stung by this elsewhere, but for this test, breaking early
sidesteps the problem.
The Mul factor was zero-extended here, resulting in incorrect
results for integers larger than 64-bit.

As we currently only multiply by 1 or -1, just split this into
two cases -- there's no need for a full multiplication here.

Fixes llvm#102597.
Implement FEAT_SME_B16B16 to enable ZA-targeting non-widening SME
BFloat16 instructions. Remove the now redundant FEAT_B16B16 which has
been replaced by FEAT_SVE_B16B16 and FEAT_SME_B16B16 (this commit), see
llvm#101480 for the details and
reasoning of this change to LLVM.

FEAT_SME_B16B16 is documented under the latest Armv9.4 feature
documentation:

https://developer.arm.com/documentation/109697/0100/Feature-descriptions/The-Armv9-4-architecture-extensio

- Changes to Clang AArch64 frontend
- Change target guard of SME2 ZA-targeting non-widening BFloat16
intrinsics to 'sme-b16b16'

- Changes to LLVM AArch64 backend
  - llvm/lib/Target/AArch64/AArch64Features.td
- Create FeatureSMEB16B16, which implies FeatureSME2 and
FeatureSVEB16B16
	- Remove FeatureB16B16
	- Fix description of FeatureSVEB16B16
  - llvm/lib/Target/AArch64/AArch64InstrInfo.td
	- Create HasSMEB16B16 predicate
  - llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
- Change predictication of SME2 ZA-targeting non-widening BFloat16
instructions to new HasSMEB16B16
  - llvm/lib/Target/AArch64/AArch64.td
- Add HasSMEB16B16 to SME2Unsupported (FEAT_SME_B16B16 implies
FEAT_SME2)
  - llvm/lib/AArch64/AsmParser/AArch64AsmParser.cpp
	- Remove flag 'b16b16' mapping to removed FeatureB16B16
	- Add flag 'sme-b16b16' mapping to new FeatureSMEB16B16

- Changes to LLVM unit tests
  - llvm/unittests/TargetParser/TargetParserTest.cpp
	- Add new sme-b16b16 flag to existing target parser tests
	- Add tests for the sme-b16b16 dependencies:
- 'sme-b16b16' should enable 'sme2', 'sve-b16b16'. - Remove 'b16b16'
from bf16 dependency test

- Added MC tests
    - llvm/test/MC/AArch64/SME2p1
- To ensure that ZA-targeting multi-vector non-widening BFloat16
instructions are enabled by +sme-b16b16, and that this feature is
removed by +nosme-b61b6.

- Modidified tests
- All CodeGen, Semantic, and MC tests that are effected by the removal
of 'b16b16', have been modified to supply and/or expect 'sme-b16b16'
where appropriate.
Include chain of ops feeding inductions in cost precomputation for
inductions, not just the induction increment. In VPlan, those
instructions will be cleaned up, as both phi and increment are generated
by VPWidenIntOrFpInductionRecipe independently.

Fixes llvm#101337.
This PR fixes emission of valid OpLifestart/OpLifestop instructions.
According to
https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpLifetimeStart:
"Size must be 0 if Pointer is a pointer to a non-void type or the
Addresses
[capability](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#Capability)
is not declared.". The `Size` argument is set the corresponding
intrinsics arguments, so Size is not zero we must ensure that Pointer
has the required type by inserting a bitcast if needed.
…01732)

This PR contains changes in virtual register processing aimed to improve
correctness of emitted MIR between passes from the perspective of
MachineVerifier. This potentially helps to detect previously missed
flaws in code emission and harden the test suite. As a measure of
correctness and usefulness of this PR we may use a mode with expensive
checks set on, and MachineVerifier reports problems in the test suite.

In order to satisfy Machine Verifier requirements to MIR correctness not
only a rework of usage of virtual registers' types and classes is
required, but also corrections into pre-legalizer and instruction
selection logics. Namely, the following changes are introduced:
* scalar virtual registers have proper bit width,
* detect register class by SPIR-V type,
* add a superclass for id virtual register classes,
* fix Tablegen rules used for instruction selection,
* fixes of minor existed issues (missed flag for proper representation
of a null constant for OpenCL vs. HLSL, wrong usage of integer virtual
registers as a synonym of any non-type virtual register).
* rename CXXIndeterminateSpliceExpr in the readme too

Signed-off-by: delimbetov <[email protected]>

* make TryAnnotateOptionalCXXScopeToken work

Signed-off-by: delimbetov <[email protected]>

* make splice work in requires clause

Signed-off-by: delimbetov <[email protected]>

* add tests for splice in requires expr

Signed-off-by: delimbetov <[email protected]>

* add typename and newline at the end of the file

Signed-off-by: delimbetov <[email protected]>

* add comments

Signed-off-by: delimbetov <[email protected]>

---------

Signed-off-by: delimbetov <[email protected]>
Some work remains: In particular, if this is going to "work" (i.e.,
supported by P2996), we need to think carefully about reachability,
TU-local entities, etc. There probably need to be some constraints
around use of imported reflections, and possibly some 'is_reachable'
metafunction. Not entirely sure - need to experiment further.

Closes issue #4.
TBD whether to keep this, but adding it so it can be played around with.
* basic impl

Signed-off-by: delimbetov <[email protected]>

* add test for the new storage duration funcs

Signed-off-by: delimbetov <[email protected]>

* code style

Signed-off-by: delimbetov <[email protected]>

* run libcxx generators to pass CI

Signed-off-by: delimbetov <[email protected]>

* fix identation

Signed-off-by: delimbetov <[email protected]>

---------

Signed-off-by: delimbetov <[email protected]>
@BaLiKfromUA BaLiKfromUA merged commit a8e6784 into BaLiKfromUA:clang-p2996/issues/5-refined Aug 20, 2024
103 of 105 checks passed
BaLiKfromUA pushed a commit that referenced this pull request Aug 26, 2024
…104523)

Compilers and language runtimes often use helper functions that are
fundamentally uninteresting when debugging anything but the
compiler/runtime itself. This patch introduces a user-extensible
mechanism that allows for these frames to be hidden from backtraces and
automatically skipped over when navigating the stack with `up` and
`down`.

This does not affect the numbering of frames, so `f <N>` will still
provide access to the hidden frames. The `bt` output will also print a
hint that frames have been hidden.

My primary motivation for this feature is to hide thunks in the Swift
programming language, but I'm including an example recognizer for
`std::function::operator()` that I wished for myself many times while
debugging LLDB.

rdar://126629381


Example output. (Yes, my proof-of-concept recognizer could hide even
more frames if we had a method that returned the function name without
the return type or I used something that isn't based off regex, but it's
really only meant as an example).

before:
```
(lldb) thread backtrace --filtered=false
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10
    frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25
    frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12
    frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12
    frame bloomberg#4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10
    frame bloomberg#5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12
    frame bloomberg#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10
    frame bloomberg#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10
    frame bloomberg#8: 0x0000000183cdf154 dyld`start + 2476
(lldb) 
```

after

```
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10
    frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25
    frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12
    frame bloomberg#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10
    frame bloomberg#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10
    frame bloomberg#8: 0x0000000183cdf154 dyld`start + 2476
Note: Some frames were hidden by frame recognizers
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.