forked from bloomberg/clang-p2996
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync with main #3
Merged
BaLiKfromUA
merged 1,486 commits into
BaLiKfromUA:clang-p2996/issues/5-refined
from
bloomberg:p2996
Aug 20, 2024
Merged
Sync with main #3
BaLiKfromUA
merged 1,486 commits into
BaLiKfromUA:clang-p2996/issues/5-refined
from
bloomberg:p2996
Aug 20, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Generate nuw GEPs for struct member accesses, as inbounds + non-negative implies nuw. Regression tests are updated using update scripts where possible, and by find + replace where not.
…spose(shape_cast) (llvm#100731)" (llvm#102457) This reverts commit 88accd9. This change can be dropped in favor of just llvm#102017.
Modifying `auto` to `auto&` to avoid unnecessary copying
…78112) https://cplusplus.github.io/CWG/issues/2627.html It is no longer a narrowing conversion when converting a bit-field to a type smaller than the field's declared type if the bit-field has a width small enough to fit in the target type. This includes integral promotions (`long long i : 8` promoted to `int` is no longer narrowing, allowing `c.i <=> c.i`) and list-initialization (`int n{ c.i };`) Also applies back to C++11 as this is a defect report.
…lvm#102573) Disables `vector.matrix_multiply` for scalable vectors. As per the docs: > This is the counterpart of llvm.matrix.multiply in MLIR I'm not aware of any use of matrix-multiply intrinsics in the context of scalable vectors, hence disabling.
As all the necessary information is encoded using attributes nowadays, this test doesn't actually depend on the triple anymore.
Syntacore SCR5 is an entry-level Linux-capable 32/64-bit RISC-V processor core. Overview: https://syntacore.com/products/scr5 Scheduling model will be added in a subsequent PR. Co-authored-by: Dmitrii Petrov <[email protected]> Co-authored-by: Anton Afanasyev <[email protected]>
…lvm#96287)" This reverts commit ccb2b01. Causes buildbot failures, e.g. on ppc64le builders.
Follow up on 199d6f2 (LSV: document hang reported in llvm#37865) to fix the build when omitting the AArch64 target. Add the missing lit.local.cfg.
We should handle allocator attributes not only on function declarations, but also on the call-site. That way we can e.g. also optimize cases where the allocator function is a virtual function call. This was already supported in some of the MemoryBuiltins helpers, but not all of them. This adds support for allocsize, alloc-family and allockind("free").
…perations. (llvm#102105) The code-generator is currently not able to handle scalable vectors of <vscale x 1 x eltty>. The usual "fix" for this until it is supported is to mark the costs of loads/stores with an invalid cost, preventing the vectorizer from vectorizing at those factors. But on rare occasions loops do not contain load/stores, only reductions. So whilst this is still unsupported return an invalid cost to avoid selecting vscale x 1 VFs. The cost of a reduction is not currently used by the vectorizer so this adds the cost to the add/mul/and/or/xor or min/max that should feed the reduction. It includes reduction costs too, for completeness. This change will be removed when code-generation for these types is sufficiently reliable. Fixes llvm#99760
If nobuiltin is set, directly return nullptr instead of using a separate out parameter and having all callers check this.
This allows moving some tests relying on -stop-after=amdgpu-isel to move to checking -stop-after=finalize-isel instead, which will more reliably pass the verifier.
I forgot to update the version info in the SDKSettings file when I updated it to the real version relevant to the test.
Add support in `-convert-gpu-to-llvm-spv` to convert `gpu.func` to `llvm.func` operations. - `spir_kernel`/`spir_func` calling conventions used for kernels/functions. - `workgroup` attributions encoded as additional `llvm.ptr<3>` arguments. - No attribute used to annotate kernels - `reqd_work_group_size` attribute using to encode `gpu.known_block_size`. - `llvm.mlir.workgroup_attrib_size` used to encode workgroup attribution sizes. This will be attached to the pointer argument workgroup attributions lower to. **Note**: A notable missing feature that will be addressed in a follow-up PR is a `-use-bare-ptr-memref-call-conv` option to replace MemRef arguments with bare pointers to the MemRef element types instead of the current MemRef descriptor approach. --------- Signed-off-by: Victor Perez <[email protected]>
This PR adds conversion patterns for MemRef to the `convert-to-spirv` pass, introduced in llvm#95942. Conversions from MemRef memory space to SPIR-V storage class were also included, and would run before the final dialect conversion phase. **Future Plans** - Add tests for ops other than `memref.load` and `memref.store` --------- Co-authored-by: Jakub Kuderski <[email protected]>
…llvm#102616) There's no need for them to have different types. Part of <llvm#62629>.
…ack for `insertelement` (llvm#82130) Prior to this patch, SelectionDAG generated aligned move onto stacks for AVX registers when the function was marked as a no-realign-stack function. This lead to misalignment between the stack and the instruction generated. This patch fixes the issue. There was a similar issue reported for `extractelement` which was fixed in a6614ec Co-authored-by: Manish Kausik H <[email protected]>
Make it possible to do things like the following, regardless of whether the offload target is nvptx or amdgpu: ``` $ clang -O1 -g -fopenmp --offload-arch=native test.c \ -Xoffload-linker -mllvm=-pass-remarks=inline \ -Xoffload-linker -mllvm=-force-remove-attribute=g.internalized:noinline\ -Xoffload-linker --lto-newpm-passes='forceattrs,default<O1>' \ -Xoffload-linker --lto-debug-pass-manager \ -foffload-lto ``` To accomplish that: - In clang-linker-wrapper, do not forward options via `-Wl` if they might have literal commas. Use `-Xlinker` instead. - In clang-nvlink-wrapper, accept `--lto-debug-pass-manager` and `--lto-newpm-passes`. - In clang-nvlink-wrapper, drop `-passes` because it's inconsistent with the interface of `lld`, which is used instead of clang-nvlink-wrapper when the target is amdgpu. Without this patch, `-passes` is passed to `nvlink`, producing an error anyway. --------- Co-authored-by: Joseph Huber <[email protected]>
Without this, the doc string is put in a single line. These scripts have multi-line docstrings, so this makes their --help output look much nicer. Otherwise, no behavior change.
Inspired by llvm#99418 (which hopefully we can replace this code with at some point)
With opaque pointers we can just get the pointer type for the resolver function by using PointerType::get, making the GlobalIFunc::getResolverFunctionType function obsolete.
The others are already inline here.
…g. (llvm#102650) Simplifies checks for AGPRs and VGPRs and makes them more explicit and less fragile.
Mention the names of unavailable registers in error messages to not make the diagnostics for execz/vccz less rich than it was. Clean up unnecessary name qualifications while there. Part of <llvm#62629>.
llvm#102123) …Type This is needed to ensure we find a type if its definition is in a CU that wasn't indexed. This can happen if the definition is in some precompiled code (e.g. the c++ standard library) which was built with different flags than the rest of the binary.
…ADS is not defined When LLVM_ENABLE_THREADS is not defined, llvm::get_threadid returns 0 which makes this test case fail. This is a pretty niche setting, Linaro uses it to stop lld crashing our 32 bit containers. So the test will get plenty of runs elsewhere. In lldb's code it's not getting the current thread ID anyway, it's using a value it got from ptrace. So even if that copy of lldb was built with LLVM_ENABLE_THREADS off, it should still be able to debug threads.
On PlayStation, allow users to supply -static to the linker, via the driver. An initial step. Later changes will have the PS5 driver supply additional options to the linker, if and when -static is passed. SIE tracker: TOOLCHAIN-16704
We only need to see that 1 frame of the stack is in user code. No need to carry on looking. Doing so actually caused a test failure on Armv8 Ubuntu Jammy where a libc function does not have a display name. I'm sure I'm going to get stung by this elsewhere, but for this test, breaking early sidesteps the problem.
The Mul factor was zero-extended here, resulting in incorrect results for integers larger than 64-bit. As we currently only multiply by 1 or -1, just split this into two cases -- there's no need for a full multiplication here. Fixes llvm#102597.
Implement FEAT_SME_B16B16 to enable ZA-targeting non-widening SME BFloat16 instructions. Remove the now redundant FEAT_B16B16 which has been replaced by FEAT_SVE_B16B16 and FEAT_SME_B16B16 (this commit), see llvm#101480 for the details and reasoning of this change to LLVM. FEAT_SME_B16B16 is documented under the latest Armv9.4 feature documentation: https://developer.arm.com/documentation/109697/0100/Feature-descriptions/The-Armv9-4-architecture-extensio - Changes to Clang AArch64 frontend - Change target guard of SME2 ZA-targeting non-widening BFloat16 intrinsics to 'sme-b16b16' - Changes to LLVM AArch64 backend - llvm/lib/Target/AArch64/AArch64Features.td - Create FeatureSMEB16B16, which implies FeatureSME2 and FeatureSVEB16B16 - Remove FeatureB16B16 - Fix description of FeatureSVEB16B16 - llvm/lib/Target/AArch64/AArch64InstrInfo.td - Create HasSMEB16B16 predicate - llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td - Change predictication of SME2 ZA-targeting non-widening BFloat16 instructions to new HasSMEB16B16 - llvm/lib/Target/AArch64/AArch64.td - Add HasSMEB16B16 to SME2Unsupported (FEAT_SME_B16B16 implies FEAT_SME2) - llvm/lib/AArch64/AsmParser/AArch64AsmParser.cpp - Remove flag 'b16b16' mapping to removed FeatureB16B16 - Add flag 'sme-b16b16' mapping to new FeatureSMEB16B16 - Changes to LLVM unit tests - llvm/unittests/TargetParser/TargetParserTest.cpp - Add new sme-b16b16 flag to existing target parser tests - Add tests for the sme-b16b16 dependencies: - 'sme-b16b16' should enable 'sme2', 'sve-b16b16'. - Remove 'b16b16' from bf16 dependency test - Added MC tests - llvm/test/MC/AArch64/SME2p1 - To ensure that ZA-targeting multi-vector non-widening BFloat16 instructions are enabled by +sme-b16b16, and that this feature is removed by +nosme-b61b6. - Modidified tests - All CodeGen, Semantic, and MC tests that are effected by the removal of 'b16b16', have been modified to supply and/or expect 'sme-b16b16' where appropriate.
Include chain of ops feeding inductions in cost precomputation for inductions, not just the induction increment. In VPlan, those instructions will be cleaned up, as both phi and increment are generated by VPWidenIntOrFpInductionRecipe independently. Fixes llvm#101337.
This PR fixes emission of valid OpLifestart/OpLifestop instructions. According to https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpLifetimeStart: "Size must be 0 if Pointer is a pointer to a non-void type or the Addresses [capability](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#Capability) is not declared.". The `Size` argument is set the corresponding intrinsics arguments, so Size is not zero we must ensure that Pointer has the required type by inserting a bitcast if needed.
…01732) This PR contains changes in virtual register processing aimed to improve correctness of emitted MIR between passes from the perspective of MachineVerifier. This potentially helps to detect previously missed flaws in code emission and harden the test suite. As a measure of correctness and usefulness of this PR we may use a mode with expensive checks set on, and MachineVerifier reports problems in the test suite. In order to satisfy Machine Verifier requirements to MIR correctness not only a rework of usage of virtual registers' types and classes is required, but also corrections into pre-legalizer and instruction selection logics. Namely, the following changes are introduced: * scalar virtual registers have proper bit width, * detect register class by SPIR-V type, * add a superclass for id virtual register classes, * fix Tablegen rules used for instruction selection, * fixes of minor existed issues (missed flag for proper representation of a null constant for OpenCL vs. HLSL, wrong usage of integer virtual registers as a synonym of any non-type virtual register).
* rename CXXIndeterminateSpliceExpr in the readme too Signed-off-by: delimbetov <[email protected]> * make TryAnnotateOptionalCXXScopeToken work Signed-off-by: delimbetov <[email protected]> * make splice work in requires clause Signed-off-by: delimbetov <[email protected]> * add tests for splice in requires expr Signed-off-by: delimbetov <[email protected]> * add typename and newline at the end of the file Signed-off-by: delimbetov <[email protected]> * add comments Signed-off-by: delimbetov <[email protected]> --------- Signed-off-by: delimbetov <[email protected]>
Some work remains: In particular, if this is going to "work" (i.e., supported by P2996), we need to think carefully about reachability, TU-local entities, etc. There probably need to be some constraints around use of imported reflections, and possibly some 'is_reachable' metafunction. Not entirely sure - need to experiment further. Closes issue #4.
TBD whether to keep this, but adding it so it can be played around with.
* basic impl Signed-off-by: delimbetov <[email protected]> * add test for the new storage duration funcs Signed-off-by: delimbetov <[email protected]> * code style Signed-off-by: delimbetov <[email protected]> * run libcxx generators to pass CI Signed-off-by: delimbetov <[email protected]> * fix identation Signed-off-by: delimbetov <[email protected]> --------- Signed-off-by: delimbetov <[email protected]>
Closes issue #87.
BaLiKfromUA
merged commit Aug 20, 2024
a8e6784
into
BaLiKfromUA:clang-p2996/issues/5-refined
103 of 105 checks passed
BaLiKfromUA
pushed a commit
that referenced
this pull request
Aug 26, 2024
…104523) Compilers and language runtimes often use helper functions that are fundamentally uninteresting when debugging anything but the compiler/runtime itself. This patch introduces a user-extensible mechanism that allows for these frames to be hidden from backtraces and automatically skipped over when navigating the stack with `up` and `down`. This does not affect the numbering of frames, so `f <N>` will still provide access to the hidden frames. The `bt` output will also print a hint that frames have been hidden. My primary motivation for this feature is to hide thunks in the Swift programming language, but I'm including an example recognizer for `std::function::operator()` that I wished for myself many times while debugging LLDB. rdar://126629381 Example output. (Yes, my proof-of-concept recognizer could hide even more frames if we had a method that returned the function name without the return type or I used something that isn't based off regex, but it's really only meant as an example). before: ``` (lldb) thread backtrace --filtered=false * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12 frame bloomberg#4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10 frame bloomberg#5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12 frame bloomberg#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame bloomberg#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame bloomberg#8: 0x0000000183cdf154 dyld`start + 2476 (lldb) ``` after ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame bloomberg#6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame bloomberg#7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame bloomberg#8: 0x0000000183cdf154 dyld`start + 2476 Note: Some frames were hidden by frame recognizers ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.