Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable result in the GenericTensorNetwork example #10

Closed
GiggleLiu opened this issue Sep 27, 2023 · 4 comments
Closed

Unstable result in the GenericTensorNetwork example #10

GiggleLiu opened this issue Sep 27, 2023 · 4 comments

Comments

@GiggleLiu
Copy link
Member

          I see. I notice that although the code is much faster and all tests pass, the current version still can not produce the correct result in the following test case.

The output is probabilistic, hence it is very likely you did not sync threads after some computation. Can you please help me to make the following code produce correct result?

using GenericTensorNetworks, GenericTensorNetworks.Graphs
using CUDA
g = Graphs.random_regular_graph(200, 3)
optimizer = TreeSA(ntrials=3)
gp = IndependentSet(g; optimizer=optimizer)
contraction_complexity(gp)
@time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
using CuTropicalGEMM
# If you run the following line multiple times, the result changes.
@time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)

Originally posted by @GiggleLiu in #9 (comment)

@ArrogantGao
Copy link
Collaborator

ArrogantGao commented Sep 28, 2023

I fixed the bug in a lazy way, I used 1D grid blocks to avoid the block overflow.

This simple test shows that the revision will not affect much the performance

julia> A = TropicalF32.(CUDA.rand(4096, 4096));

julia> B = TropicalF32.(CUDA.rand(4096, 4096));

julia> C = TropicalF32.(CUDA.zeros(4096, 4096));

julia> @benchmark CUDA.@sync mul!($C, $A, $B)
BenchmarkTools.Trial: 63 samples with 7 evaluations.
 Range (min … max):   4.613 μs … 13.549 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     13.545 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.339 ms ±  4.983 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                                            █
  ▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  4.61 μs         Histogram: frequency by time        13.5 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> 4096^3 * 2 / 11.339 / 1e9
12.120906029808625

Here are the result in the GenericTensorNetwork using the revised version of the package:

julia> using GenericTensorNetworks, GenericTensorNetworks.Graphs

julia> using CUDA

julia> using Random; Random.seed!(6)
TaskLocalRNG()

julia> g = Graphs.random_regular_graph(200, 3)
{200, 300} undirected simple Int64 graph

julia> item(x::AbstractArray) = Array(x)[]
item (generic function with 1 method)

julia> optimizer = TreeSA(ntrials=1)
TreeSA{Int64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, GreedyMethod{OMEinsumContractionOrders.MinSpaceOut}, Any}(20, 0.01:0.05:14.96, 1, 50, 1.0, 0.2, :greedy, 0, Any[], GreedyMethod{OMEinsumContractionOrders.MinSpaceOut}(OMEinsumContractionOrders.MinSpaceOut(), 1))

julia> gp = IndependentSet(g; optimizer=optimizer)
┌ Warning: target space complexity not found, got: 28.0, with time complexity 33.47258510488839, read-write complexity 30.100720957270713.
└ @ OMEinsumContractionOrders ~/.julia/packages/OMEinsumContractionOrders/WpwIz/src/treesa.jl:229
IndependentSet{OMEinsum.SlicedEinsum{Int64, OMEinsum.DynamicNestedEinsum{Int64}}, NoWeight}(OMEinsum.SlicedEinsum{Int64, OMEinsum.DynamicNestedEinsum{Int64}}(Int64[], 18, 18 ->
├─ 18∘72, 18∘72 -> 18
│  ├─ 9∘72, 18∘9∘72 -> 18∘72
│  │  ├─ 9∘72
│  │  └─ 7∘18∘9, 7∘72 -> 18∘9∘72
│  │     ├─ 41∘22∘7, 41∘18∘22∘9 -> 7∘18∘9
│  │     │  ├─ 7∘41, 7∘22 -> 41∘22∘7
│  │     │  │  ⋮
│  │     │  │
│  │     │  └─ 22∘176∘9∘90, 9∘41∘22∘18∘176∘90 -> 41∘18∘22∘9
│  │     │     ⋮
│  │     │
│  │     └─ 72, 7∘72 -> 7∘72
│  │        ├─ 72
│  │        └─ 7∘72
│  └─ 18∘72
└─ 18
), SimpleGraph{Int64}(300, [[44, 158, 182], [66, 126, 167], [67, 85, 113], [10, 76, 105], [145, 148, 171], [56, 104, 180], [22, 41, 72], [149, 166, 173], [72, 90, 174], [4, 69, 129]  …  [55, 63, 137], [49, 86, 158], [15, 127, 176], [36, 102, 176], [120, 134, 178], [32, 94, 118], [30, 96, 113], [28, 81, 165], [45, 98, 189], [71, 75, 108]]), NoWeight(), Dict{Int64, Int64}())

julia> contraction_complexity(gp)
Time complexity: 2^33.47258510488839
Space complexity: 2^28.0
Read-write complexity: 2^30.100720957270713

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
 31.197158 seconds (54.31 M allocations: 3.654 GiB, 2.21% gc time, 93.08% compilation time: <1% of which was recompilation)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> using CuTropicalGEMM

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.502034 seconds (539.53 k allocations: 32.931 MiB, 80.44% compilation time: 87% of which was recompilation)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.110021 seconds (192.73 k allocations: 9.086 MiB)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.116131 seconds (192.73 k allocations: 9.086 MiB)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.110667 seconds (192.73 k allocations: 9.086 MiB)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.214065 seconds (222.74 k allocations: 10.002 MiB, 33.76% gc time, 2.98% compilation time)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> VERSION
v"1.10.0-beta2"

@ArrogantGao
Copy link
Collaborator

I made some efforts to enhance the perform of this package on mul between thin martices, but meet some strange bugs in CUDA, I will list that as a issue and try to improve that in the future.

@ArrogantGao
Copy link
Collaborator

Since this bug is fixed with #15, this issue is now closed.

@ArrogantGao
Copy link
Collaborator

solve by #15, which is already merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants