-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMDe 0.7.4 #347
Open
mr-c
wants to merge
5
commits into
soedinglab:master
Choose a base branch
from
mr-c:simde_0.7.4
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
SIMDe 0.7.4 #347
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
02c7a67e sse: remove unbalanced HEDLEY_DIAGNOSTIC_PUSH b0b370a4 x86/sse: Add LoongArch LSX support 2338f175 arch: Add LoongArch LASX/LSX support 90d95fae avx512: define __mask64 & __mask32 if not yet defined 42a43fa5 sve/true,whilelt,cmplt,ld1,st1,sel,and: skip AVX512 native implementations on MSVC 2017 20f98da6 sve/whilelt: correct type-o in __mmask32 initialization 47a1500f sve/ptest: _BitScanForward64 and __builtin_ctzll is not available in MSVC 2017 cd93fcc9 avx512/knot,kxor: native calls not availabe on MSVC 2017 ba6324b6 avx512/loadu: _mm{,256}_loadu_epi{8,16,32,64} skip native impl on MSVC < 2019 2f6fe9c6 sse2/avx: move some native aliases around to satisfy MSVC 2017 /ARCH:AVX512 91fda2cc axv512/insert: unroll SIMDE_CONSTIFY for testing macro implemented functions a397b74b __builtin_signbit: add cast to double for old Clang versions e016050b clmul: _mm512_clmulepi64_epi128 implicitly requires AVX512F 7e353c00 Wasm q15mulr_sat_s: match Wasm spec ce375861 Wasm f32/f64 nearest: match Wasm spec 96d5e034 Wasm f32/f64 floor/ceil/trunc/sqrt: match Wasm spec 5676a1ba Wasm f32/f64 abs: match Wasm spec aa299c08 Wasm f32/f64 max: match Wasm spec 433d2b95 Wasm f32/f64 min: match Wasm spec cf1ac40b avx{,2}: some intrinsics are missing from older MSVC versions bff9b1b3 simd128: move unary minus to appease msvc native arm64 efc512a4 neon/ext: unroll SIMDE_CONSTIFY for testing macro implemented functions 091250e8 neon/addlv: disable SSSE3 impl of _vaddlvq_s16 for MSVC 4b305360 neon/ext: simde_*{to,from}_m64 reqs MMX_NATIVE 2dedbd9b skip many mm{,_mask,_maskz}_roundscale_round_{ss,sd} testing on MSVC + AVX a04ea7bc f16c: rounding not yet implemented for simde_mm{256,}_cvtps_ph e8ee041a ci appveyor: build tests with AVX{,2}, but don't run them 2188c972 arm/neon/add{l,}v: SSE2/SSSE3 opts _vadd{lvq_s8, lvq_s16, lvq_u8, vq_u8} 186f12f1 axv512: add simde_mm512_{cvtepi32_ps,extractf32x8_ps,_cmpgt_epi16_mask} 6a40fdeb arm/neon/rnd: use correct SVML function for simde_vrndq_f64 9a0705b0 svml: simde_mm256_{clog,csqrt}_ps native reqs AVX not SSE c298a7ec msvc avx512/roundscale_round: quiet a false positive warning 01d9c5de sse: remove errant MMX requirement from simde_mm_movemask_ps c675aa08 x86/avx{,2}: use SIMDE_FLOAT{32,64}_C to fix warnings from msvc 097af509 msvc 2022: enable F16C if AVX2 present 91cd7b64 avx{,2}: fix maskload illegal mem access 2caa25b8 Fixed simde_mm_prefetch warnings 96bdf523 Fixed parameters to _mm_clflush 4d560e41 emscripten; don't use __builtin_roundeven{f,} even if defined 511a01e7 avx512/compress: Mitigate poor compressstore performance on AMD Zen 4 a22b63dc avx512/{knot,kxor,cmp,cmpeq,compress,cvt,loadu,shuffle,storeu} Additional AVX512{F,BW,VBMI2,VL} ops 3d87469f wasm simd128: correct trunc_sat _FAST_CONVERSION_RANGE target type 56ca5bd8 Suppress min/max macro definitions from windows.h f2cea4d3 arm/neon/qdmulh s390 gcc-12: __builtin_shufflevector is misbehaving 3698cef9 neon/cvt: clang bug 46844 was fixed in clang 12.0 9369cea4 simd128: clang 13 fixed bugs affecting simde_wasm_{v128_load8_lane,i64x2_load32x2} ce27bd09 gcc power: vec_cpsgn argument reversal fixed in 12.0 20fd5b94 gcc power: bugs 1007[012] fixed in GCC 12.1 5e25de13 gcc sse2: bug 99754 was fixed in GCC 12.1 e6979602 gcc i686 mm*_dpbf16_ps: skip vector ops due to rounding error 359c3ff4 clang wasm simde: add workaround to fix wasm_i64x2_shl bug b767f5ed arm/neon: workaround on ARM64 windows bug 599b1fbf mips/msa: fix for Windows ARM64 c6f4821e arm64 windows: fix simd128.h build error 782e7c73 prepare to release 0.7.4 6e9ac245 fix A32V7 version of _mm_test{nz,}c_si128 776f7a69 test with Debian default flags, also for armel a240d951 x86: fix AVX native → SSE4.2 native 5a73c2ce _mm_insert_ps: incorrect handling of the control 597a1c9e neon/ld1[q]_*_x2: initial implementation 4550faea wasm: f32x4 and f64x2 nearest roundeven 5e068645 Add missing `static const` in simde-math.h. NFC da02f2ce avx512/setzero: fix native aliases 89762e11 Fixed FMA detection macro on msvc b0fda5cf avx512/load_pd: initial implementation a61af077 avx512/load_ps: initial implementation 4126bde0 Properly map __mm functions to __simde_mm 2e76b7a6 neon ld2: gcc-12 fixes 604a53de fix wrong size e5e085ff AVX: add native calls for _mm256_insertf128_{pd,ps,si256} ee3bd005 aarch64 + clang-1[345] fix for "implicit conversion changes signedness" a060c461 wasm: load lane memcpy instead of cast to address UBSAN issues cbef1c15 avx512/permutex2var: hard-code types in casts instead of using typeof 71a65cbd gfni: add cast to work around -Wimplicit-int-conversion warning 10dd508b avx512/scalef: work around for GCC bug #101614 277b303b neon/cvt: fix compilation with -ffast-math 9ec8c259 avx512/scalef: _mm_mask_scalef_round_ss is still missing in GCC e821bee3 Wrap static assertions in code to disable -Wreserved-identifier 13cf2969 The fix for GCC bug #95483 wasn't in a release until 11.2 b66e3cb9 avx2: separate natural vector length for float, int, and double types dda31b76 Add -Wdeclaration-after-statement to the list of ignored warnings. 9af03cd0 Work around compound literal warning with clang 74a4aa59 neon/clt: Add SSE/AVX512 fallbacks 02ce512d neon/mlsl_high_n: initial implementation 6472321c neon/mlal_high_n: initial implementation 2632bbc1 neon/subl_high: initial implementation d1d2362d neon/types: remove duplicate NEON float16_t definitions 456812f8 sse: avoid including windows.h when possible 332dcc83 neon/reinterpret: change defines to work with templated callers e369cd0c neon/cge: Improve some of the SSE2 fallbacks 3397efe1 deal with WASM SIMD128 API changes. 3aa4ae58 neon/rndn: Fix macros to workaround bugs 30b3607b neon/ld1: Fix macros in order to workaround bugs 8cac29c6 neon/cge: Implement f16 functions c96b3ae6 neon/cagt: Implement f16 functions f948d39a neon/bsl: Implement f16 functions d6e025bd neon/reinterpret: f16_u16 and u16_f16 implementations 5e763da5 neon/add: Implement f16 functions 5a7c9e13 neon/ceqz: Implement f16 functions 1ba94bc4 neon/dup_n: Implement f16 functions af26004a neon/ceq: Implement f16 functions e41944f3 neon/st1: Add f16 functions a660d577 neon/cvt: Implement f16 functions 412da5b3 neon/ld1: Implement f16 functions 068485c9 neon/cage: Initial f16 implementations 89fb99ee neon: Implement f16 types 50a56ef7 sse4.2: work around more warnings on old clang fa54e7b3 avx512/permutex2var: work around incorrect definition on old clang d20c7bf8 sse: use portable implementation to work around llvm bug #344589 371fd445 avx: work around incorrect maskload/store definitions on clang < 3.8 3bb373c8 Various fixes for -fno-lax-vector-conversions f26ad2d1 avx512/fixupimm: initial implementation f9182e3b Fix warnings with -fno-lax-vector-conversions 37c26d7f avx512/dpbusds: complete function family 0dc7eaf6 sse: replace _mm_prefetch implementation b7fd63d9 neon/ld1q: u8_x2, u8_x3, u8_x4 6427473b neon/mul: add improved SSE2 vmulq_s8 implementation b843d7e1 avx512/cvt: add _mm512_cvtepu32_ps 5df05510 simd128: improve many lt and gt implementation 495a0d2a neon/mul: implement unsigned multiplication using signed functions 2b087a1c neon/qadd: fix warning in ternarylogic call in vaddq_u32 f027c8da neon/qabs: add some faster implementations bf6667b4 simd128: add fast sqrt implementations d490ca7a simd128: add fast extmul_low/high implementations 2abd2cc0 simd128: add NEON and POWER shift implementations 3032eb33 simd128: add fast promote/demote implementations e92273a6 simd128: add dedicated functions for unsigned extract_lane 34c5733c sse2, sse4.1: pull in improved packs/packus implementations from WASM 1bfc221c simd128: add fast narrow implementations f333a089 simd128: add fast implementations of extend_low/extend_high b4e0d0cc msa/madd: initial implementation c09e6b0a neon/rndn: work around some missing functions in GCC on armv8 cc7afa77 avx512/4dpwssds: initial implementation a9cec6fe avx512/dpbf16: implement remaining functions 371da5f8 avx512/dpwssds: initial implementation ccef3bee common: Use AArch64 intrinsics if _M_ARM64EC is defined f79c08c3 xop: fix NEON implementation of maccs functions to use NEON types 9eb0a88d sse4.1: use NEON types instead of vector in insert implementations 0bbae5ff avx512/roundscale: don't assume IEEE 754 storage 77673258 fma: use NEON types in simde_mm_fnmadd_ps NEON implementation 865412e7 sse2: remove statement expr requirement for NEON srli/srai macros 573c0a24 sse4.1: replace NEON implementations with shuffle-based implementations 534794b2 sse4.1: remove statement expr dependency in blend functions a571ca8c fma: fix return value of simde_mm_fnmadd_ps on NEON df95ab8e sse, sse2: clean up several shuffle macros 44e25b30 sse2: add parenthesis around macro arguments 305ac0a8 avx512/set, avx512/popcnt: use _mm512_set_epi8 only when available 98de6621 relaxed-simd: add blend functions 974f83d5 relaxed-simd: add fms functions a46a04b7 relaxed-simd: add fma functions 54c62bf7 avx512/popcnt: implement remaining functions d4dc926f avx512/dpbf16: initial implementation b9a7904d avx512/4dpwssd: implement complete function family f54cc98a avx512/dpwssd: initial implementation 7e877d17 avx512/bitshuffle: initial implementation 9e96b711 avx512/dpbusd: implement remaining functions 423572d5 simd128: use vec_cmpgt instead of vec_cmplt in pmin 73b6978f sse, sse2: fix vec_cpsign order test 7c0bdbff gfni: remove unintentional dependency on vector extensions 26fcfdb1 simd128: add fast ceil implementations 85035430 Improve widening pairwise addition implementations 8f35dc1a simd128: add fast max/pmax implementations a8adeffc neon/cvt: disable some code on 32-bit x86 which uses _mm_cvttsd_si64 29955848 avx512/shldv: limit shuffle-based version to little endian ae330dd9 simd128: add NEON, Altivec, & vector extension sub_sat implementations 9debe735 neon/cvt, relaxed-simd: add work-around for GCC bug #101614 eab383d9 avx512/dbsad: add vector extension impl. and improve scalar version 79c93ce0 sse, sse2: sync clang-12 changes for vec_cpsgn 7205c644 avx512/cvtt: _mm_cvttpd_epi64 is only available on x86_64 42538f0e simd128, sse2: more cvtpd_ps/f32x4_demote_f64x2_zero implementations 1bec285e simd128, sse2: add more madd_epi16 / i32x4_dot_i16x8 implementations 6dfdf3d2 simd128: vector extension implementation of floating-point abs 00c3b68b simd128, neon/neg: add VSX implementations of abs and neg functions 7f3a52d0 neon/cgt, simd128: improve some unsigned comparisons on x86 f5184634 neon/abd: add much better implementations 9b1974dd Add @aqrit's SSE2 min/max implementations 9caf5e6e simd128: add more pmin/pmax implementations dcd00397 neon/qrdmulh: steal WASM q15mulr_sat implementation for qrdmulhq_s16 34dee780 simd128: add SSE2 q15mulr_sat implementation fe3e623e neon/min: add SSE2 vminq_u32 implementation 4abbb4db neon/min: add SSE2 vqsubq_u32 implementation c1158835 simd128: add improved min implementations on several architectures c059f800 relaxed-simd: add trunc functions 0394e967 simd128: add several some AArch64 and Altivec trunc_sat implementations 3fa2026b Fix several places where we assumed NEON used vector extensions. 6a183313 neon/qsub: add some SSE and vector extension implementations 313561fe msa/subv: initial implementation 8f1155e4 msa/andi: initial implementation d20bca47 msa/and: initial implementation 82e93303 gfni: work around clang bug #50932 3a27037f arch: set SIMDE_ARCH_ARM for AArch64 on MSVC d19a9d6a msa/adds: initial implementation 41f9ad33 neon/qadd: improve SSE implementation eb55cce3 avx512/shldv: initial implementation ee0a83e1 avx512/popcnt: initial implementation 48855d3a msa/adds_a: initial implementation 6133600b neon/qadd: add several improved x86 and vector extension versions 6b5814d9 avx512/ternarylogic: implement remaining functions 3fba9986 Add many fast floating point to integer conversion functions b2f01b98 neon/st4_lane: Implement remaining functions ccc9e2c8 neon/st3_lane: Implement remaining functions 3f0859be neon/st2_lane: Implement remaining functions e136dfe7 neon/ld1_dup: Add f64 function implementations 4a2ceb45 neon/cvt: add some faster x86 float->int/uint conversions b82b16ac neon/cvt: Add vcvt_f32_f64 and vcvt_f64_f32 implementations 477068c9 neon/st2: Implement remaining functions 3a93c5dd neon/ld4_lane: Implement remaining functions 75838c15 neon/qshlu_n: Add scalar function implementations 7d314092 simde/scalef: add scalef_ss/sd d3547dac msa/add_a: initial implementation 8ba8dc84 msa/addvi: initial implementation b1006161 Begin working on implementing MIPS MSA. 38088d10 fma: use fma/fms instead of mla/mls on NEON 76c4b7cd neon/cle: add some x86 implementations d045a667 neon/cle: improve formatting of some x86 implementations 6fc12601 relaxed-simd: initial support for the WASM relaxed SIMD proposal 2d430eb4 neon/ld2: Implement remaining functions fc3aef94 neon/ld1_lane: Implement remaining functions 0ec9c9c9 neon/rsqrte: Implement remaining functions 92e72c44 neon/rsqrts: Add remaining function implementations e7cdccd0 neon/qdmulh_lane: Add remaining function implementations 905f1e4c neon/recpe: Add remaining function implementations 96cebc42 neon/recps: Add scalar function implementations 63ad6d0a neon/qrdmulh_lane: Add scalar function implementations f8dacd07 simde-diagnostic: Include simde-arch 4ad3f10f neon/mul_lane: Add mul_laneq functions 25d0fe82 neon/sri_n: Add scalar function implementations 6fb9fa3a neon/shl_n: Add scalar function implementations 5738564f neon/shl: Add scalar implementations fc2aed9b neon/rsra_n: Add scalar function implementations 7c7d8d80 neon/qshrn_n: Add scalar function implementations 76e65444 neon/qrshrn_n: Add scalar function implementations 25aa2124 neon/rshr_n: Add custom scalar function for utility 6d1c7aaf avx512/dbsad: initial implementation 4b1ba2ce avx512/dpbusd: initial implementation 02719bcc svml: remove some dead stores from cdfnorminv 803b29ac sse2: fix set but not used variable in _mm_cvtps_epi32 7ee622df Use SIMDE_HUGE_FUNCTION_ATTRIBUTES on several functions. 80439178 arch: fix SIMDE_ARCH_POWER_ALTIVEC_CHECK to include AltiVec check 604a90af neon/cvt: fix a couple of s390x implementations' NaN handling a0fe7651 simd128: work around bad diagnostic from clang < 7 cd742d66 f16c: use __ARM_FEATURE_FP16_VECTOR_ARITHMETIC to detect Arm support 4f39e4fc Fix an assortment of small bugs 4bf12875 Remove all `&& 0`s in preprocessor macros. 8e0d0f93 simd128: remove stray `&& 0` d98f81cb simd128: add optimized f32x4.floor implementations b626266d simd128: add some Arm implementations of all_true 78957358 simd128: any_true implementations for Arm 20cd4d00 simd128: add improved add_sat implementations ea364550 wasm128, sse2: disable -Wvector-conversion when calling vgetq_lane_s64 4e09afb4 neon/zip1: add armv7 implementations f27932a7 simd128: add x86/Arm/POWER implementations 2bcd59bb avx512/conflict: implement missing functions 7da82adb avx512/multishift: initial implementation e7229088 various: correct PPC and z/Arch versions plus typo 005d39c8 simd128: fix portable fallback for wasm_i8x16_swizzle 860127a1 Add NEON, SSE3, and AltiVec implementations of wasm_i8x16_swizzle 0959466e simd128: add AltiVec implementations of any/all_true 7f38c52e simd128: add vec_abs implementation of wasm_i8x16_abs e2cb9632 simd128: work around clang bugs 50893 and 50901. 77e4f57d avx512/rol: implement remaining functions 1d60dc03 avx512/rolv: initial implementation 30681718 avx512: initial implementation 38f8ef8f avx512/ternarylogic: initial implementation 3efe186a Add constrained compilation mode 1faf7872 simd128: add simde_wasm_i64x2_ne 68616767 avx512/scalef: implement remaining functions 6ea919f8 avx512/conflict: implements mm_conflict_epi32 ad5d51c5 avx512/scalef: initial implementation 4f0f1e8f neon/qrshrun_n: Add scalar function implementations dc278de7 neon/rshr_n: Add scalar function implementations 86f73e1e neon/rndn: Add macro corrections 189d7762 neon/qshrun_n: Add scalar function implementations 1fc63065 neon/rshl: Add scalar function implementations 4ca2973e neon/rndn: Add scalar function implementation d78398c8 neon/qdmulh: Add scalar function implementations 7d43b7c9 neon/pmin: Add scalar function implementations 4dacfeff neon/pmax: Add scalar function implementations abccc767 neon/padd: Add scalar function implementations b3d97677 neon/neg: Complete implementation of function family 137afad7 neon/dup_lane: Complete implementation of function family ef93f1bb neon/fma_lane: Implement fmaq_lane functions e9dcfe8b neon/sra_n: Add scalar function implementations 44cf247c neon/shr_n: Add scalar function implementations ca78eb82 neon/sub: Implements the two remaining scalar functions 65d8d52f avx512/rorv: implement _mm{256,512}{,_mask,_maskz}_rorv_epi{32,64} 1afa8148 Many work-arounds for GCC with MSA, and support in the docker image. 8bf571ac neon/ext: clean up shuffle-based implementation 51790ff8 avx512/rorv: initial implementation of _mm_rorv_epi32 952dab89 neon/st3: Add shuffle vector implementations 2229f4ba sse, sse2: work around GCC bug #100927 e0b88179 neon/ld{2,3,4}: disable -Wmaybe-uninitialized on all recent GCC 76c76bfa neon/fma_lane: portable and native implementations 002b4066 neon/mul_lane: finish implementation of function family ae959e7e neon/;shlu_n: faster WASM implementations 7df8e3ab neon/qshlu_n: initial implementation 338eb083 neon/ld4: use conformant array parameters 049eaa9e neon/vld4: Wasm optimization of vld4q_u8 720db9ff neon/st3q_u8: Wasm optimization ccf235e1 neon/qdmull: add WASM implementations 06a64a94 neon/movl: improve WASM implementation e36a029e neon/tbl: add WASM implementation of vtbl1_u8 5debb615 neon/tst: implement scalar functions cef74f3b neon/hadd,hsub: optimization for Wasm 502243a2 neon/qrdmulh_lane: fix typo in undefs 6eb625d7 fma: drop weird high-priority implementation in _mm_fmadd_ps 47ba41d6 neon/qshrn_n: initial implementation b94e0298 neon/qrdmulh: native aliases for scalar functions should be A64 f27e9fcb neon/qrdmulh_lane: initial implementation 04e2ca66 neon/subhn: initial implementation 8b129a93 neon/sri_n: add 128-bit implementations 88dd65de neon/mull_lane: initial implementation 12c940ed neon/mlsl_lane: initial implementation abc8dacf neon/mlal_lane: initial implementation 9438ea43 neon/dup_lane: fix macro for simde_vdup_laneq_u16 36e2ce5b neon/{add,sub}w_high: use vmovl_high instead of vmovl + get_high d86492fa neon/sri_n: native and portable 60715735 neon/qshrun_n: native and portable implementations de84bcd0 neon/qdmulh_lane: native and portable 4581232f avx512/roundscale_round: implement remaining functions 76b19b97 avx512/range_rounnd,round: move range_round functions out of round 2ba2b7b8 neon/ld1_dup: native and portable (64-bit vectors) f6fd4b67 neon/dup_lane: implement vdupq_lane_f64 07b4a2b3 neon/shll_n: native and portable implementations 58a0188d neon/dupq_lane: native and portable 623f2207 neon/st4_lane: portable and native *_{s,u}{8,16,32} 322663be neon/st3_lane: portable and native *_{s,u}{8,16,32} 7700b2e5 neon/st2_lane: portable and native for _{u,s}{8,16,32} acc67df2 neon/cltz: Add scalar functions and natural vector fallbacks fcf6e88e neon/clt: Add implementations of scalar functions 799e1629 neon/clez: Add implementaions of scalar functions f22ae740 neon/addhn: initial implementation 8774393f avx512/cmp{g,l}e: AVX-512 implementations of non-mask functions 1eb57468 avx512/cmple: finish implementations of all cmple functions 9b60d826 avx512/cmpge: fix bad _mm512_cmpge_epi64_mask implementation 6849da33 avx: use internal symbols in clang fallbacks for cmp_ps/pd functions f2746208 avx512/cmpge: finish implementing all functions 135cbbf0 avx512/range: implement mm{,512}{,_mask,_maskz}_range_round* 6421a835 avx512/round, avx512/roundscale: add shorter vector fallbacks 5c6673f5 avx512/roundscale: implement simde_mm{256,512}_roundscale_ps 6fcb4433 neon/cle: Add implementations for remaining functions a49bdc1c neon/fma_n: the 32-bit functions are missing on GCC on arm 05172a08 neon/ld4: work around spurious warning on clang < 10 2fa3d1d8 neon/qdmulh: add shuffle-based implementations ea22a611 neon/qdmulh_n: native and portable implementations 5ef8e53d neon/qrshrn_n: native and portable implementations fda538d1 neon/ld1_lane: portable and native implementations 8f118bbd neon/cgtz: Add implementations of remaining functions 31d5048c neon/cgt: Add implementation of remaining functions 79274d8d neon/ld4_lane: move private type usage to inside loop bdcfccb7 neon/ld4_lane: native and portable implementations bbc35b65 avx512/range: don't used masked comparisons for 128/256-bit versions ef90404e avx512/range: fix fallback macros 5d00aa4c features: add z/arch to SIMDE_NATURAL_VECTOR_SIZE 83cab7c1 sve/cmplt: replace vec_and with & for s390 implementations a636d0ae Fix gcc-10 compilation on s/390x bb35d9f0 gfni: work around error with vec_bperm on clang-10 on POWER 2db3ba03 gfni: replace vec_and and vec_xor with & and ^ on z/arch cdb3f68c sse, mmx: fix clang-11 on POWER 233fef43 gfni: add many x86, ARM, z/Arch, PPC and WASM implementations git-subtree-dir: lib/simde/simde git-subtree-split: 02c7a67ed825018f9efdf2a7e4f39d8196f65337
Whoops, guess I'll have to make a 0.7.4.1 release of SIMDe! 😅 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The magic commands are