Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/sycl #1168

Draft
wants to merge 375 commits into
base: develop
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
375 commits
Select commit Hold shift + click to select a range
80d01e1
remove unnecessary asserts
jcosborn May 27, 2022
b7852c6
Merge branch 'develop' into feature/sycl
jcosborn May 27, 2022
a65d46a
update block reduction
jcosborn May 31, 2022
444ac0a
reverse sycl range order
jcosborn Jun 1, 2022
99e971d
update cmake
jcosborn Jun 6, 2022
833094e
use device mem for large kernel args
jcosborn Jun 7, 2022
91998ae
use device mem for more kernel args
jcosborn Jun 7, 2022
63352ac
use multiple arg buffers and track stream syncs
jcosborn Jun 8, 2022
0e50f0d
added some wasSynced
jcosborn Jun 8, 2022
572e987
use pinned memory for arg buffer
jcosborn Jun 8, 2022
133d4f9
put large args in managed mem
jcosborn Jun 10, 2022
686f9e2
fixed arg buf
jcosborn Jun 10, 2022
1d48879
use device mem for large kernel args
jcosborn Jun 10, 2022
6e9fd81
just use pinned mem for large args
jcosborn Jun 10, 2022
328bdc0
updates to atomics, warp shift and reductions
jcosborn Jun 15, 2022
f59db64
set warp size for multireduce
Jun 22, 2022
ab318cc
fix block size for reductions
jcosborn Jun 22, 2022
5099cd4
revert to high-level reductions
jcosborn Jun 23, 2022
5a371eb
start implementation of high-level multireduction
jcosborn Jul 12, 2022
5419bfb
Merge branch 'develop' into feature/sycl
jcosborn Jul 12, 2022
5d6eda3
add new math functions
jcosborn Jul 18, 2022
adc0f95
Merge branch 'develop' into feature/sycl
jcosborn Jul 18, 2022
d3191aa
Merge branch 'develop' into feature/sycl
jcosborn Jul 27, 2022
171c830
Merge branch 'develop' into feature/sycl
jcosborn Aug 24, 2022
be8bd7a
add buildtests target
jcosborn Aug 26, 2022
9be134f
small improvements to unitarization
jcosborn Aug 29, 2022
51ac9fb
add missing file
jcosborn Aug 29, 2022
ae9412e
add MKL ZGEMM
jcosborn Aug 30, 2022
a82a093
style changes to rng
jcosborn Sep 2, 2022
87cbcca
update rng code
jcosborn Sep 7, 2022
ca53507
Merge branch 'develop' into feature/sycl
jcosborn Sep 15, 2022
40d7cf9
Merge branch 'use_kernel_arg' into feature/sycl
jcosborn Sep 22, 2022
2243742
Merge branch 'develop' into feature/sycl
jcosborn Sep 22, 2022
2aa0fa4
update FFT
jcosborn Sep 23, 2022
24c4909
undo changes
jcosborn Sep 23, 2022
fa6e2b9
fix transform_reduce
jcosborn Sep 24, 2022
be00e0c
split tune_quda.h
jcosborn Oct 2, 2022
2be6509
fix kernel launch parameter adjustment
jcosborn Oct 5, 2022
5df1405
comment out warning on X globalSize
jcosborn Oct 5, 2022
6b02c4a
remove warnings
jcosborn Oct 6, 2022
793b71e
Merge branch 'develop' into feature/sycl
jcosborn Oct 11, 2022
6056657
make FFT optional
jcosborn Oct 13, 2022
1d6001d
report last error if no tuning candidates
jcosborn Oct 26, 2022
163396c
update to SYCL2020 APIs
Nov 8, 2022
9b92832
updates for new SDK
jcosborn Nov 9, 2022
1edf823
some updates to reductions
jcosborn Nov 16, 2022
62f2818
updates to reductions
Nov 16, 2022
56217a0
updates to block reductions
jcosborn Nov 16, 2022
fea2f85
fix special case
jcosborn Nov 16, 2022
d245b30
Merge branch 'develop' into feature/sycl
jcosborn Dec 1, 2022
aa2ea41
fixes for clang++
jcosborn Dec 9, 2022
b739985
some fixes for intel clang++ on CUDA
jcosborn Dec 15, 2022
725cdca
Merge branch 'develop' into feature/sycl
jcosborn Dec 15, 2022
ef578ad
allow kernel3D to have any thread size
jcosborn Jan 11, 2023
7890e0d
use local_accessor for SharedMemoryCache
jcosborn Jan 13, 2023
27d9b48
make multireduce use local_accessor
jcosborn Jan 13, 2023
9649bb2
make multiblas work with all threads
jcosborn Jan 13, 2023
0cdf6d9
remove all range adjustments and checks
jcosborn Jan 13, 2023
1e9bd5b
make purely dynamic shared memory cache
jcosborn Jan 16, 2023
b5d78e8
fix BlockSync
jcosborn Jan 16, 2023
27f0a60
kernel3d reject when local X size isn't divisible by warp size
jcosborn Jan 16, 2023
6ca5df8
make all kernels reject local x not divisible by warp size
jcosborn Jan 16, 2023
4d419eb
add SpecialOps handling
jcosborn Jan 22, 2023
85ed70e
add shared memory to large argument size kernel
jcosborn Jan 24, 2023
ee10b7c
fix special ops detection
jcosborn Jan 25, 2023
69ed694
revert warp_combine to work on real and imag parts separately
jcosborn Jan 25, 2023
9ef0a15
small additions
jcosborn Jan 25, 2023
6226329
Merge branch 'develop' into feature/sycl
jcosborn Jan 26, 2023
449f567
remove old code
jcosborn Jan 26, 2023
c4b5107
update all SharedMemoryCache to use new interface
jcosborn Jan 28, 2023
3628db4
Merge branch 'develop' into feature/sycl
jcosborn Jan 30, 2023
4a80cbe
small udpates
jcosborn Feb 2, 2023
cf5488d
update BlockReduce uses to use SpecialOps
jcosborn Feb 3, 2023
1c19295
fix laplace and covDev
jcosborn Feb 4, 2023
8845fe2
disable gauge fix FFT tests if no FFT
jcosborn Feb 4, 2023
63b2543
fix FFT check in tests
jcosborn Feb 6, 2023
824ae2c
fix SpecialOps propagation in dslash
Feb 8, 2023
18215db
updates to special ops
jcosborn Feb 8, 2023
4dd491c
fix compilation
jcosborn Feb 9, 2023
df249ea
allow all block dimensions
jcosborn Feb 9, 2023
3a6a625
Merge branch 'develop' into feature/sycl
jcosborn Feb 9, 2023
2b27851
don't do schwartz tests if no MMA
jcosborn Feb 9, 2023
eef0c4d
remove return in dslash_functor
jcosborn Feb 17, 2023
f57101c
fix allthreads for ndeg twisted clover
jcosborn Feb 17, 2023
b4761f7
make thread_array use SLM
jcosborn Feb 21, 2023
7921af3
fix shared memory cache sync
jcosborn Feb 21, 2023
1df2425
fix thread array
jcosborn Feb 22, 2023
3457de7
update special ops, revert thread_array
jcosborn Feb 23, 2023
98f4c19
Merge branch 'develop' into feature/sycl
jcosborn Feb 23, 2023
eb525b7
fix allthreads in dslash functor
jcosborn Feb 23, 2023
61c3c4c
update SpecialOps interface
jcosborn Feb 23, 2023
15248d2
update SpecialOps
jcosborn Feb 24, 2023
0a773b0
fix needsSharedMemory
jcosborn Feb 24, 2023
fe30b4d
fix and equals
jcosborn Feb 24, 2023
9b1f4eb
test some workarounds
jcosborn Feb 24, 2023
44ab85b
test reduce_over_group reductions
jcosborn Feb 24, 2023
19e4217
restore gauge fix ovr
jcosborn Feb 25, 2023
f598b79
update local atomics
jcosborn Feb 25, 2023
fe7b7d4
Merge branch 'develop' into feature/sycl
jcosborn Feb 25, 2023
219ea62
Merge branch 'develop' into feature/sycl
jcosborn Mar 13, 2023
e429fbe
move header
jcosborn Mar 14, 2023
6d15363
Merge branch 'develop' into feature/sycl
jcosborn Mar 17, 2023
30e4ae4
Merge branch 'develop' into feature/sycl
jcosborn Mar 23, 2023
9a80cfe
fixes for merge
jcosborn Mar 24, 2023
f9997d3
fix device list string creation
jcosborn Mar 27, 2023
95e1267
add local atomic_add
jcosborn Apr 14, 2023
a8d54c9
Merge branch 'develop' into feature/sycl
jcosborn Apr 17, 2023
c0fcf7a
add missing file
jcosborn Apr 17, 2023
ee3fbcb
update SharedMemoryCache usage
jcosborn Apr 17, 2023
a2e2000
reorganize thread local cache
jcosborn May 16, 2023
0f0e4fa
added missing NoSpecialOps
jcosborn Jun 15, 2023
38c8032
remove deprecated SYCL check for managed memory
jcosborn Aug 16, 2023
0cff772
Merge branch 'develop' into feature/sycl
jcosborn Aug 16, 2023
ad4e270
added SharedMemory target object
jcosborn Aug 17, 2023
c050c45
small update to shared memory helper
jcosborn Aug 22, 2023
2438954
fix use of SharedMemoryCache
jcosborn Aug 23, 2023
fe4009d
check shared mem size consistency before launch
jcosborn Sep 6, 2023
590578c
Merge branch 'develop' into feature/sycl
jcosborn Sep 10, 2023
206c4c6
add tagging for thread_array
jcosborn Sep 13, 2023
93a03f3
update CUDA target
jcosborn Sep 14, 2023
08af61e
fixes to CUDA target
jcosborn Sep 15, 2023
67caa46
fix shared memory usages
jcosborn Sep 16, 2023
5d2542b
update SYCL target
jcosborn Sep 17, 2023
f8e16b7
Merge branch 'develop' into feature/sycl
jcosborn Sep 17, 2023
e8c16b6
fix wilson dslash
jcosborn Sep 17, 2023
2cd916a
update HIP target
jcosborn Sep 18, 2023
af8d555
add missing file
jcosborn Sep 18, 2023
20847d2
add missing file
jcosborn Sep 18, 2023
d4ae385
fix HIP include
jcosborn Sep 18, 2023
11bcd04
fix generic reduce helper
jcosborn Sep 18, 2023
7cbffc8
change active handling in ndeg dslash
jcosborn Sep 19, 2023
bebf8db
don't use warp splitting in multiblas
jcosborn Sep 19, 2023
d827beb
restore needsFullBlock for SharedMemoryCache
jcosborn Sep 19, 2023
d1405a1
update interface for block transpose
jcosborn Sep 19, 2023
ec0bea9
update twisted clover dslash
jcosborn Sep 19, 2023
ba9aab8
more work on early thread exit
jcosborn Sep 19, 2023
8dbf79f
update mobius dslash
jcosborn Sep 19, 2023
718cba8
fix mobius dslash for CUDA
jcosborn Sep 21, 2023
e9cd06d
make thread_array and ThreadLocalCache use shared memory again
jcosborn Sep 21, 2023
a5d3463
make special ops initialization mandatory at construction time
jcosborn Oct 4, 2023
c17ae76
fix CUDA build
jcosborn Oct 5, 2023
25e2f05
Merge branch 'develop' into feature/sycl
jcosborn Oct 19, 2023
464dc12
Merge branch 'develop' into feature/sycl
jcosborn Oct 19, 2023
1ebbbe3
update quda api
jcosborn Oct 20, 2023
86fc1ce
fix needsFullBlock
jcosborn Oct 25, 2023
b680199
cleanup special ops
jcosborn Nov 2, 2023
922692d
fix issue with CMake
jcosborn Nov 2, 2023
2918ed9
remove queue submits from cpp files
jcosborn Nov 2, 2023
e0afb86
fix handling of user specified SYCL link flags
jcosborn Nov 2, 2023
0106b98
fix some shared memory offsets
jcosborn Nov 14, 2023
1348d90
fix some warnings
jcosborn Nov 17, 2023
51d75ae
Merge branch 'develop' into feature/sycl
jcosborn Nov 30, 2023
92ca04d
fix some shared bytes amounts
jcosborn Nov 30, 2023
eebfcce
sync hisq_paths_force with sycl-merge branch
jcosborn Nov 30, 2023
a8de82a
fix DW shared mem bytes
jcosborn Dec 1, 2023
a65d36b
don't use ext_oneapi_submit_barrier
jcosborn Dec 7, 2023
b38fca1
Merge branch 'develop' into feature/sycl
jcosborn Dec 7, 2023
48ddc2c
Merge branch 'develop' into feature/sycl
jcosborn Jan 4, 2024
90f4356
fix HIP build
jcosborn Jan 4, 2024
35407fe
rename SpecialOps to KernelOps
jcosborn Jan 4, 2024
1c56c08
update include
jcosborn Jan 4, 2024
a63765a
cleanup KernelOps
jcosborn Jan 5, 2024
ae7072d
fix domain wall shared bytes
jcosborn Jan 6, 2024
fd1393e
cleanup kernel ops
jcosborn Jan 8, 2024
59bcd6e
add missing file
jcosborn Jan 8, 2024
1214b1c
Merge branch 'develop' into feature/sycl
jcosborn Jan 17, 2024
f2d94e1
fix SYCL on NVIDIA by replacing std math library calls with SYCL vers…
jcosborn Jan 25, 2024
4844105
add missing headers
jcosborn Jan 25, 2024
ccbba91
detect clang CUDA too in math_helper.h
jcosborn Jan 25, 2024
ef04c5c
fix shared bytes for restrictor
jcosborn Jan 29, 2024
40ddc38
fix shared memory consistency checks
jcosborn Feb 6, 2024
927c75f
fix unused parameter errors
jcosborn Feb 6, 2024
d83c453
fix MG with tuning
jcosborn Feb 6, 2024
90e1742
improve shared bytes checking
jcosborn Feb 13, 2024
b26abb3
prepare to test all comms directions together
jcosborn Feb 21, 2024
8a1d011
add support for MPI_Testsome
jcosborn Feb 22, 2024
86765f2
fix compilation
jcosborn Feb 22, 2024
76b639c
add missing comm function
jcosborn Feb 22, 2024
b977d4f
revert comms changes
jcosborn Apr 22, 2024
273edef
Merge branch 'develop' into feature/sycl
jcosborn Apr 23, 2024
a36a0fd
fix some tests
jcosborn Apr 25, 2024
3e323b9
Merge branch 'develop' into feature/sycl
jcosborn Apr 25, 2024
26a327c
fix laplace tests
jcosborn Apr 28, 2024
b97d72c
fix shared bytes count
jcosborn Apr 28, 2024
a0c2836
Merge branch 'develop' into feature/sycl
jcosborn Apr 29, 2024
68079f7
Fix buffer overflow in Unpack
jcosborn Apr 30, 2024
3c6cbb7
fix clang issue
jcosborn May 17, 2024
374b39c
Merge branch 'develop' into feature/sycl
jcosborn May 19, 2024
93f3a3c
fix CUDA and HIP builds
jcosborn May 19, 2024
269107d
Merge branch 'develop' into feature/sycl
jcosborn May 28, 2024
da6b5d5
replace some std math calls in device code
jcosborn Jun 6, 2024
c2c01c9
add link flag
jcosborn Jun 7, 2024
d72614b
Merge branch 'develop' into feature/sycl
jcosborn Jun 21, 2024
0f42b26
don't cast void** to float*
jcosborn Jul 19, 2024
f0a99f3
remove some reinterpret casts
jcosborn Jul 20, 2024
26ed86c
add missing elem
jcosborn Jul 20, 2024
77c0937
remove VLAs
jcosborn Jul 21, 2024
7e7116f
remove more casts
jcosborn Jul 21, 2024
43afaa6
fix staggered ghost
jcosborn Jul 21, 2024
cff3d5a
fix typo
jcosborn Jul 21, 2024
58c5ce2
add missing include
jcosborn Aug 14, 2024
81ed75d
make cufft optional on CUDA
jcosborn Aug 27, 2024
23c3cf9
add space in message
jcosborn Aug 28, 2024
0b2a93d
Merge branch 'develop' into feature/sycl
jcosborn Aug 28, 2024
364e587
add get_state stub
jcosborn Aug 28, 2024
4736876
replace punning with memcpy
jcosborn Sep 13, 2024
8f31078
fix type
jcosborn Sep 13, 2024
e8a23a9
fix some issues with newer SDK
jcosborn Sep 13, 2024
352aeb5
fix race with reduction functors using shared mem
jcosborn Sep 17, 2024
61c542d
fix missing ndi
jcosborn Sep 17, 2024
5392c5e
Merge branch 'develop' into feature/sycl
jcosborn Sep 17, 2024
fdb3567
fix page fault in ndeg twisted clover
jcosborn Sep 18, 2024
78e48b1
fix hang in ndeg twisted clover
jcosborn Sep 18, 2024
03da754
fix QDPJITOrder
jcosborn Sep 28, 2024
edec34b
add comms checksum and hang check
jcosborn Sep 29, 2024
ccef93b
switch off comms checking by default
jcosborn Sep 29, 2024
e0aaa14
skip comm checksum for strided
jcosborn Sep 29, 2024
f8f8cac
don't wait on comm checksum in start
jcosborn Sep 30, 2024
b6ad267
improve hang detection
jcosborn Sep 30, 2024
9289b34
improve hang output
jcosborn Sep 30, 2024
3f5fc11
improve hang message
jcosborn Oct 1, 2024
0c00270
disable hang warning
jcosborn Oct 1, 2024
6e6562e
Merge branch 'develop' into feature/sycl
jcosborn Oct 15, 2024
cd36a1d
add some extra CG output
jcosborn Oct 19, 2024
81f8175
add more output to CG
jcosborn Oct 19, 2024
aec8207
add more output to CG
jcosborn Oct 19, 2024
67027d2
more output in CG
jcosborn Oct 19, 2024
13c13be
avoid some casts
jcosborn Oct 21, 2024
fc1fed2
Merge branch 'develop' into feature/sycl
jcosborn Oct 21, 2024
bc86978
update CG output
jcosborn Oct 22, 2024
3623826
fix uninitialized struct field
jcosborn Oct 23, 2024
b6f827d
restore QDPJIT memory allocation
jcosborn Oct 28, 2024
f24fc2d
fix bug in gauge copy
jcosborn Oct 31, 2024
1a356c1
fix offset in qudaMemset2DAsync
jcosborn Nov 1, 2024
78cae9c
switch qudaEventRecord to use ext_oneapi_submit_barrier
jcosborn Nov 2, 2024
11bfaf1
revert erroneous fix
jcosborn Nov 3, 2024
ceb334c
allow more block sizes in block orthogonalize when not tuning
jcosborn Nov 6, 2024
f996369
fix CI
jcosborn Nov 7, 2024
3b1154e
Fix bug with fused Mobius kernel when using multiple RHS
maddyscientist Oct 24, 2024
c573936
Merge branch 'develop' into feature/sycl
jcosborn Nov 8, 2024
1e0d0d3
Merge branch 'develop' into feature/sycl
jcosborn Nov 18, 2024
81f2f7d
Merge branch 'develop' into feature/sycl
jcosborn Nov 18, 2024
ddfc127
Merge branch 'develop' into feature/sycl
jcosborn Nov 22, 2024
c4536cb
clean up test changes
jcosborn Nov 22, 2024
6237fea
fix compilation
jcosborn Nov 22, 2024
3c3d80a
Merge branch 'develop' into feature/sycl
jcosborn Nov 22, 2024
7c24446
Merge branch 'develop' into feature/sycl
jcosborn Nov 23, 2024
94a0c38
Merge branch 'develop' into feature/sycl
jcosborn Dec 10, 2024
31a28bd
small updates
jcosborn Dec 18, 2024
1668a3c
Merge branch 'develop' into feature/sycl
jcosborn Dec 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
revert erroneous fix
  • Loading branch information
jcosborn committed Nov 3, 2024
commit 11bfaf18006de8f53be629945b70a04284cdc75e
2 changes: 1 addition & 1 deletion lib/copy_gauge_helper.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ namespace quda
"gauge fields");
}

resizeVector(vector_length_z, (is_ghost ? in.Ndim() : in.Geometry()) * 2); // only resizing z component
resizeVector(vector_length_y, (is_ghost ? in.Ndim() : in.Geometry()) * 2); // only resizing z component
}

void apply(const qudaStream_t &stream) override
Expand Down
Loading