-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is it possible c-ocl_*_win64 #363
Comments
I'm afraid I don't have such OCL based Windows platform currently. Can you explain more about your OCL environment? e.g. For NVIDIA/AMD GPU, do they link to the same libocl.dll, using shared OCL header files? |
BTW, we recommend you try hlsl_win64 since it is DirectX based, working for both AMD GPU, NVIDIA GPU, even Intel GPU. |
BACKEND=c-hlsl_win64 antares To be honest i know a little about DirectX. a long time ago, i used. You need "old" OpenCL SDK from nVidia (~2012) and/or the last AMD OpenCL SDK (~2013-2014) as far as i remember. Here from AMD it should work for nVidia and AMD gpus. You shall find the header and the libraries inside. I haven't touched here the last/new from AMD one just some releases infos https://github.com/GPUOpen-LibrariesAndSDKs/OCL-SDK/releases |
Wow, it is definitely not expected. Firstly, you need "latest system updated" Windows 10 or Windows 11 (64 bit). Then you may suffer from a broken download of |
BTW, if you install latest AMDGPU drivers, why your system doesn't have |
Ok, we will be distracted by two different parallel problem. may be the directx can be done in another thread. On, WSL, Ubuntu 18 or the newer one Ubuntu 20, I installed openCL but it doens't work. Now, I am scientist/mathematician i am calculting trillions terms. I can offload my cpu and nVidia So, I have AMDGPU working in win 11 and i am offloading my cpu/gpu(nvidia) over wsl. i want to let them talk. PS: if you convert to nvc++/nvidia you can offload the cpu and nvidia cards simultaneously. with the same c code Thank you really for antares, making it work for opencl will solve a lot of problems for many many different. PS: it was a nice trick to use mingw/WSL/Ubuntu to call .dll libraries for the amdhip64.dll, I like it, i didn't know it. |
for hlsl I applied your reg file antares_hlsl_tdr_v0.1.reg |
I did it. By the way, you have assumed that there is one WSL. There can be many so for antares to work one should |
Congratulations. Many of our previous investigations proves HLSL can work as efficient as openCL, and it has a standard interface defined by Windows that can cover all graphic GPUs, as long as you install graph drivers. If your machines have both AMD/Renoir and nVidia/1660TI equipped, and hlsl work for AMD, a feasible way to make it turn to use nVIDIA resource, is by disabling AMD graphic device in "Windows Device Manager", although this may not be what you want if you want to use them simultaneously. |
Can you have a try on this? $ pip3 install --upgrade antares==0.3.20.12
$ antares clean
$ DEVICE_ID=0 STEP=100 antares # this should use one of the GPU, maybe AMD
$ DEVICE_ID=1 STEP=100 antares # this should use another GPU, maybe NVIDIA |
very good. ok, I am learning HLSL now.
======================================================================================================================== STEP[100 / 100] Current Best Config = {"Foutput0:D0": [-1, 2, 4, 4], "Foutput0:D1": [-1, 1, 16, 1], "Foutput0:O": [1, 0], "Foutput0:S": 2, "Foutput0:R": 0}, Perf = 3.01407e-05 sec / op (17.3947 Gflops), MemRatio = -1 %, Occur Step = 29; ======================================================================================================================== [Best Config] CONFIG='{"Foutput0:D0": [-1, 2, 4, 4], "Foutput0:D1": [-1, 1, 16, 1], "Foutput0:O": [1, 0], "Foutput0:S": 2, "Foutput0:R": 0}' ==> Performance is up to 17.394686 Gflops, occurred at step 29 / 100; time per run = 3.01407e-05 sec. DEVICE_ID=1 STEP=100 antares # it uses the AMD
======================================================================================================================== STEP[100 / 100] Current Best Config = {"Foutput0:D0": [-1, 2, 4, 32], "Foutput0:D1": [-1, 1, 16, 2], "Foutput0:O": [1, 0], "Foutput0:S": 4, "Foutput0:R": 1}, Perf = 0.000184122 sec / op (2.8475 Gflops), MemRatio = -1 %, Occur Step = 90; ======================================================================================================================== [Best Config] CONFIG='{"Foutput0:D0": [-1, 2, 4, 32], "Foutput0:D1": [-1, 1, 16, 2], "Foutput0:O": [1, 0], "Foutput0:S": 4, "Foutput0:R": 1}' ==> Performance is up to 2.847503 Gflops, occurred at step 90 / 100; time per run = 0.000184122 sec. just for curiosity i tried $ DEVICE_ID=2 STEP=2 antares
[Antares] Incorrect compute kernel from evaluator.
[Antares] Incorrect compute kernel from evaluator. may be, it is better to give "Incorrect DEVICE_ID" |
But I am afraid it is very low GFlops, usually i can get some Teraflops. |
Because the computation by default is elementwise which is a memory-bound operation. If you want to test how high it can each in TFlops, you'd better try a large GEMM. |
OK, the main power of GPU is parallel array. This is how I discover it. Something that can takes hours in Mathematica(Parallel) can be done in milliseconds over the GPU. I initialize an empty array then copied to the GPU, measuring and computing my zeta over the GPU, The main point that generating the whole zeta at once over the GPU takes milliseconds actually sometimes less Here, we go ============================= import pycuda.driver as cuda n = 32 mod = SourceModule(""" nn=32 print("last element =", a[ nn * gd-1, nn * gd-1]) tt4=_time() ================================= $ python3 ztc2.py
|
I hope the editor didn't miss up my code. I correct it 3 times. |
Hi, thank you for your nice work.
for me and many others, opencl doesn't work on wsl2
microsoft/WSL#6372
microsoft/WSL#6951
so i am wondering if your c-rocm_win64 works for you but not for me like
#269
#284
i am wondering is it possible to extend antares opencl_*_win64
to use opencl from windows which definitely works fine for my amd/gpu and nvidia/gpu.
If possible, it will be great and will help many to go around opencl/wsl2 issues.
The text was updated successfully, but these errors were encountered: