Unstable result in the GenericTensorNetwork example #10

GiggleLiu · 2023-09-27T12:20:22Z

          I see. I notice that although the code is much faster and all tests pass, the current version still can not produce the correct result in the following test case.

The output is probabilistic, hence it is very likely you did not sync threads after some computation. Can you please help me to make the following code produce correct result?

using GenericTensorNetworks, GenericTensorNetworks.Graphs
using CUDA
g = Graphs.random_regular_graph(200, 3)
optimizer = TreeSA(ntrials=3)
gp = IndependentSet(g; optimizer=optimizer)
contraction_complexity(gp)
@time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
using CuTropicalGEMM
# If you run the following line multiple times, the result changes.
@time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)

Originally posted by @GiggleLiu in #9 (comment)

The text was updated successfully, but these errors were encountered:

ArrogantGao · 2023-09-28T11:02:31Z

I fixed the bug in a lazy way, I used 1D grid blocks to avoid the block overflow.

This simple test shows that the revision will not affect much the performance

julia> A = TropicalF32.(CUDA.rand(4096, 4096));

julia> B = TropicalF32.(CUDA.rand(4096, 4096));

julia> C = TropicalF32.(CUDA.zeros(4096, 4096));

julia> @benchmark CUDA.@sync mul!($C, $A, $B)
BenchmarkTools.Trial: 63 samples with 7 evaluations.
 Range (min … max):   4.613 μs … 13.549 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     13.545 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   11.339 ms ±  4.983 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                                            █
  ▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  4.61 μs         Histogram: frequency by time        13.5 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> 4096^3 * 2 / 11.339 / 1e9
12.120906029808625

Here are the result in the GenericTensorNetwork using the revised version of the package:

julia> using GenericTensorNetworks, GenericTensorNetworks.Graphs

julia> using CUDA

julia> using Random; Random.seed!(6)
TaskLocalRNG()

julia> g = Graphs.random_regular_graph(200, 3)
{200, 300} undirected simple Int64 graph

julia> item(x::AbstractArray) = Array(x)[]
item (generic function with 1 method)

julia> optimizer = TreeSA(ntrials=1)
TreeSA{Int64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, GreedyMethod{OMEinsumContractionOrders.MinSpaceOut}, Any}(20, 0.01:0.05:14.96, 1, 50, 1.0, 0.2, :greedy, 0, Any[], GreedyMethod{OMEinsumContractionOrders.MinSpaceOut}(OMEinsumContractionOrders.MinSpaceOut(), 1))

julia> gp = IndependentSet(g; optimizer=optimizer)
┌ Warning: target space complexity not found, got: 28.0, with time complexity 33.47258510488839, read-write complexity 30.100720957270713.
└ @ OMEinsumContractionOrders ~/.julia/packages/OMEinsumContractionOrders/WpwIz/src/treesa.jl:229
IndependentSet{OMEinsum.SlicedEinsum{Int64, OMEinsum.DynamicNestedEinsum{Int64}}, NoWeight}(OMEinsum.SlicedEinsum{Int64, OMEinsum.DynamicNestedEinsum{Int64}}(Int64[], 18, 18 ->
├─ 18∘72, 18∘72 -> 18
│  ├─ 9∘72, 18∘9∘72 -> 18∘72
│  │  ├─ 9∘72
│  │  └─ 7∘18∘9, 7∘72 -> 18∘9∘72
│  │     ├─ 41∘22∘7, 41∘18∘22∘9 -> 7∘18∘9
│  │     │  ├─ 7∘41, 7∘22 -> 41∘22∘7
│  │     │  │  ⋮
│  │     │  │
│  │     │  └─ 22∘176∘9∘90, 9∘41∘22∘18∘176∘90 -> 41∘18∘22∘9
│  │     │     ⋮
│  │     │
│  │     └─ 72, 7∘72 -> 7∘72
│  │        ├─ 72
│  │        └─ 7∘72
│  └─ 18∘72
└─ 18
), SimpleGraph{Int64}(300, [[44, 158, 182], [66, 126, 167], [67, 85, 113], [10, 76, 105], [145, 148, 171], [56, 104, 180], [22, 41, 72], [149, 166, 173], [72, 90, 174], [4, 69, 129]  …  [55, 63, 137], [49, 86, 158], [15, 127, 176], [36, 102, 176], [120, 134, 178], [32, 94, 118], [30, 96, 113], [28, 81, 165], [45, 98, 189], [71, 75, 108]]), NoWeight(), Dict{Int64, Int64}())

julia> contraction_complexity(gp)
Time complexity: 2^33.47258510488839
Space complexity: 2^28.0
Read-write complexity: 2^30.100720957270713

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
 31.197158 seconds (54.31 M allocations: 3.654 GiB, 2.21% gc time, 93.08% compilation time: <1% of which was recompilation)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> using CuTropicalGEMM

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.502034 seconds (539.53 k allocations: 32.931 MiB, 80.44% compilation time: 87% of which was recompilation)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.110021 seconds (192.73 k allocations: 9.086 MiB)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.116131 seconds (192.73 k allocations: 9.086 MiB)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.110667 seconds (192.73 k allocations: 9.086 MiB)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.214065 seconds (222.74 k allocations: 10.002 MiB, 33.76% gc time, 2.98% compilation time)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> VERSION
v"1.10.0-beta2"

ArrogantGao · 2023-09-28T11:10:01Z

I made some efforts to enhance the perform of this package on mul between thin martices, but meet some strange bugs in CUDA, I will list that as a issue and try to improve that in the future.

ArrogantGao · 2023-09-28T13:51:00Z

Since this bug is fixed with #15, this issue is now closed.

ArrogantGao · 2023-09-28T17:10:16Z

solve by #15, which is already merged.

ArrogantGao mentioned this issue Sep 27, 2023

fixed a sync problem #15

Merged

ArrogantGao closed this as completed Sep 28, 2023

ArrogantGao reopened this Sep 28, 2023

ArrogantGao closed this as completed Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unstable result in the GenericTensorNetwork example #10

Unstable result in the GenericTensorNetwork example #10

GiggleLiu commented Sep 27, 2023

ArrogantGao commented Sep 28, 2023 •

edited

Loading

ArrogantGao commented Sep 28, 2023

ArrogantGao commented Sep 28, 2023

ArrogantGao commented Sep 28, 2023

Unstable result in the GenericTensorNetwork example #10

Unstable result in the GenericTensorNetwork example #10

Comments

GiggleLiu commented Sep 27, 2023

ArrogantGao commented Sep 28, 2023 • edited Loading

ArrogantGao commented Sep 28, 2023

ArrogantGao commented Sep 28, 2023

ArrogantGao commented Sep 28, 2023

ArrogantGao commented Sep 28, 2023 •

edited

Loading