Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REPL display happens in different task, breaking synchronization #831

Closed
clintonTE opened this issue Apr 12, 2021 · 6 comments · Fixed by #837
Closed

REPL display happens in different task, breaking synchronization #831

clintonTE opened this issue Apr 12, 2021 · 6 comments · Fixed by #837
Labels
bug Something isn't working needs information Further information is requested

Comments

@clintonTE
Copy link
Contributor

clintonTE commented Apr 12, 2021

Describe the bug

Just lots of what looks like undefined behavior in conversions.

To reproduce

# for instance
julia> m = rand(5,2)
5×2 Matrix{Float64}:
 0.651199  0.711706
 0.908387  0.5976
 0.786686  0.537776
 0.598196  0.249163
 0.81549   0.707133

julia> dm = m |> CuMatrix{Float64}
5×2 CuArray{Float64, 2}:
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0

# sometimes this works
julia> dm = m|>CuMatrix{Float32}
5×2 CuArray{Float32, 2}:
 0.651199  0.711706
 0.908387  0.5976
 0.786686  0.537776
 0.598196  0.249163
 0.81549   0.707133

#And sometimes not
julia> dm = m|>CuMatrix{Float32}
5×2 CuArray{Float32, 2}:
  3.56885f26   1.82167
  1.7878       1.29699f-18
 -1.25462f8    1.77455
  1.8521      -5.63526f-5
 -1.68402f19   1.82887

#Other bizarre behavior (probably related)
 CUDA.rand(5,2)
5×2 CuArray{Float32, 2}:
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0

This is on CUDA 3.01, though I also tried master. 2.6.2 works fine.

Version info

Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_CUDA_USE_BINARYBUILDER = true

Details on CUDA:

CUDA toolkit 11.2.2, artifact installation
CUDA driver 11.2.0
NVIDIA driver 461.33.0

Libraries:
- CUBLAS: 11.4.1
- CURAND: 10.2.3
- CUFFT: 10.4.1
- CUSOLVER: 11.1.0
- CUSPARSE: 11.4.1
- CUPTI: 14.0.0
- NVML: 11.0.0+461.33
- CUDNN: 8.10.0 (for CUDA 11.2.0)
- CUTENSOR: 1.2.2 (for CUDA 11.1.0)

Toolchain:
- Julia: 1.6.0
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

Environment:
- JULIA_CUDA_USE_BINARYBUILDER: true

1 device:
  0: GeForce RTX 2070 with Max-Q Design (sm_75, 6.743 GiB / 8.000 GiB available)
@clintonTE clintonTE added the bug Something isn't working label Apr 12, 2021
@maleadt
Copy link
Member

maleadt commented Apr 13, 2021

Can't reproduce. Could you check if the correct dm results 'appear' some time later? Maybe after a device_synchronize()?

@maleadt maleadt added the needs information Further information is requested label Apr 13, 2021
@clintonTE
Copy link
Contributor Author

Waiting doesn't seem to do anything- even checking after a full minute shows the same garbage.

On the other hand, the results look correct after calling device_synchronize().

@maleadt
Copy link
Member

maleadt commented Apr 14, 2021

Well, that's interesting, and a little disconcerting. Could you dev CUDA.jl, open lib/cudadrv/libcuda.jl, replace all ccalls with @debug_ccall and re-run a minimal set of operations that shows the failure? The output should show all API calls, so that we can spot if we're using the wrong streams.

An alternative to debugging is to trace your application under nsight-systems (where we can similarly see the streams being used), but that requires some set-up on your side.

@maleadt maleadt changed the title Array conversions lead to undefined behavior New synchronization does not work on Windows Apr 14, 2021
@maleadt
Copy link
Member

maleadt commented Apr 14, 2021

Sigh, this seems like a Windows issue:

julia> a = CUDA.ones(2,2)
cuCtxGetCurrent(Base.RefValue{Ptr{Nothing}}) = CUDA_SUCCESS
 1: Ptr{Nothing} @0x0000000044daae30
cuMemAllocAsync(Base.RefValue{CuPtr{Nothing}}, 16, CuStream(0x000000005f7f5ff0, CuContext(0x0000000044daae30, instance 85ada79074e039f4))) = CUDA_SUCCESS
 1: CuPtr{Nothing}(0x0000000303c00200)
cuCtxGetCurrent(Base.RefValue{Ptr{Nothing}}) = CUDA_SUCCESS
 1: Ptr{Nothing} @0x0000000044daae30
cuMemsetD32Async(CuPtr{UInt32}(0x0000000303c00200), 1065353216, 4, CuStream(0x000000005f7f5ff0, CuContext(0x0000000044daae30, instance 85ada79074e039f4))) = CUDA_SUCCESS
2×2 CuArray{Float32, 2}:
cuCtxGetCurrent(Base.RefValue{Ptr{Nothing}}) = CUDA_SUCCESS
 1: Ptr{Nothing} @0x0000000044daae30
cuPointerGetAttribute(Base.RefValue{UInt32}, CU_POINTER_ATTRIBUTE_MEMORY_TYPE, CuPtr{Nothing}(0x000000001a037710)) = CUDA_ERROR_INVALID_VALUE
 1: 3851
cuCtxGetCurrent(Base.RefValue{Ptr{Nothing}}) = CUDA_SUCCESS
 1: Ptr{Nothing} @0x0000000044daae30
cuStreamQuery(CuStream(0x000000005f7f60e0, CuContext(0x0000000044daae30, instance 85ada79074e039f4))) = CUDA_SUCCESS
cuCtxGetCurrent(Base.RefValue{Ptr{Nothing}}) = CUDA_SUCCESS
 1: Ptr{Nothing} @0x0000000044daae30
cuMemcpyDtoHAsync_v2(Ptr{Float32} @0x000000001a037710, CuPtr{Float32}(0x0000000303c00200), 16, CuStream(0x000000005f7f60e0, CuContext(0x0000000044daae30, instance 85ada79074e039f4))) = CUDA_SUCCESS
cuCtxGetCurrent(Base.RefValue{Ptr{Nothing}}) = CUDA_SUCCESS
 1: Ptr{Nothing} @0x0000000044daae30
cuStreamQuery(CuStream(0x000000005f7f60e0, CuContext(0x0000000044daae30, instance 85ada79074e039f4))) = CUDA_SUCCESS
 0.0  0.0
 0.0  0.0

julia> device_synchronize()
cuCtxGetCurrent(Base.RefValue{Ptr{Nothing}}) = CUDA_SUCCESS
 1: Ptr{Nothing} @0x0000000044daae30
cuCtxSynchronize() = CUDA_SUCCESS

julia> a
2×2 CuArray{Float32, 2}:
cuCtxGetCurrent(Base.RefValue{Ptr{Nothing}}) = CUDA_SUCCESS
 1: Ptr{Nothing} @0x0000000044daae30
cuPointerGetAttribute(Base.RefValue{UInt32}, CU_POINTER_ATTRIBUTE_MEMORY_TYPE, CuPtr{Nothing}(0x000000001a0a9a90)) = CUDA_ERROR_INVALID_VALUE
 1: 456176240
cuCtxGetCurrent(Base.RefValue{Ptr{Nothing}}) = CUDA_SUCCESS
 1: Ptr{Nothing} @0x0000000044daae30
cuStreamQuery(CuStream(0x000000005f7f60e0, CuContext(0x0000000044daae30, instance 85ada79074e039f4))) = CUDA_SUCCESS
cuCtxGetCurrent(Base.RefValue{Ptr{Nothing}}) = CUDA_SUCCESS
 1: Ptr{Nothing} @0x0000000044daae30
cuMemcpyDtoHAsync_v2(Ptr{Float32} @0x000000001a0a9a90, CuPtr{Float32}(0x0000000303c00200), 16, CuStream(0x000000005f7f60e0, CuContext(0x0000000044daae30, instance 85ada79074e039f4))) = CUDA_SUCCESS
cuCtxGetCurrent(Base.RefValue{Ptr{Nothing}}) = CUDA_SUCCESS
 1: Ptr{Nothing} @0x0000000044daae30
cuStreamQuery(CuStream(0x000000005f7f60e0, CuContext(0x0000000044daae30, instance 85ada79074e039f4))) = CUDA_SUCCESS
 1.0  1.0
 1.0  1.0

Somehow we're synchronizing a different stream?

@maleadt
Copy link
Member

maleadt commented Apr 14, 2021

Oh no

julia> struct Foo
         task::Task
       end

julia> Base.show(io::IO, ::MIME"text/plain", foo::Foo) =
         println(io, "Foo was constructed with task $(foo.task), but display happens in task $(current_task())")

julia> foo = Foo(current_task())
Foo was constructed with task Task @0x00000000191d0010, but display happens in task Task @0x00000000191d36c0

@maleadt maleadt changed the title New synchronization does not work on Windows REPL display happens in different task, breaking synchronization Apr 14, 2021
@clintonTE
Copy link
Contributor Author

Confirmed that #837 fixed the issue. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs information Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants