Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed a sync problem #15

Merged
merged 5 commits into from
Sep 28, 2023
Merged

fixed a sync problem #15

merged 5 commits into from
Sep 28, 2023

Conversation

ArrogantGao
Copy link
Collaborator

The main change is by force sync in the C code, the problem mentioned in #10 is fixed, results are shown as

julia> using GenericTensorNetworks, GenericTensorNetworks.Graphs

julia> using CUDA

julia> g = Graphs.random_regular_graph(200, 3)
{200, 300} undirected simple Int64 graph

julia> optimizer = TreeSA(ntrials=3)
TreeSA{Int64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, GreedyMethod{OMEinsumContractionOrders.MinSpaceOut}, Any}(20, 0.01:0.05:14.96, 3, 50, 1.0, 0.2, :greedy, 0, Any[], GreedyMethod{OMEinsumContractionOrders.MinSpaceOut}(OMEinsumContractionOrders.MinSpaceOut(), 1))

julia> gp = IndependentSet(g; optimizer=optimizer)
┌ Warning: target space complexity not found, got: 23.0, with time complexity 30.29749478594516, read-write complexity 25.905654474039626.
└ @ OMEinsumContractionOrders ~/.julia/packages/OMEinsumContractionOrders/WpwIz/src/treesa.jl:229
IndependentSet{OMEinsum.SlicedEinsum{Int64, OMEinsum.DynamicNestedEinsum{Int64}}, NoWeight}(OMEinsum.SlicedEinsum{Int64, OMEinsum.DynamicNestedEinsum{Int64}}(Int64[], 5, 5 -> 
├─ 147, 147∘5 -> 5
│  ├─ 147
│  └─ 147∘5, 5∘147 -> 147∘5
│     ├─ 182∘143∘147, 182∘143∘5 -> 147∘5
│     │  ├─ 147∘182, 143∘147 -> 182∘143∘147
│     │  │  ├─ 147∘182
│     │  │  └─ 143∘147
│     │  └─ 182∘136∘192∘143, 192∘136∘5 -> 182∘143∘5
│     │     ├─ 151∘178∘198∘33∘78∘146∘143∘196∘182, 33∘136∘78∘192∘146∘143∘198∘196∘178∘151 -> 182∘136∘192∘143
│     │     │  ⋮
│     │     │  
│     │     └─ 5∘192, 5∘136 -> 192∘136∘5
│     │        ⋮
│     │        
│     └─ 5∘147
└─ 5
), SimpleGraph{Int64}(300, [[40, 52, 126], [43, 113, 170], [10, 17, 97], [17, 96, 117], [136, 147, 192], [70, 75, 172], [26, 50, 144], [56, 139, 179], [177, 179, 192], [3, 56, 93]  …  [110, 122, 179], [5, 9, 142], [36, 45, 69], [75, 80, 127], [98, 102, 137], [54, 57, 119], [90, 125, 161], [15, 79, 163], [65, 77, 174], [18, 35, 47]]), NoWeight(), Dict{Int64, Int64}())

julia> contraction_complexity(gp)
Time complexity: 2^30.29749478594516
Space complexity: 2^23.0
Read-write complexity: 2^25.905654474039626

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  4.446523 seconds (7.53 M allocations: 502.525 MiB, 3.03% gc time)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
88.0ₜ

julia> using CuTropicalGEMM

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.048345 seconds (121.74 k allocations: 6.030 MiB)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
88.0ₜ

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.053084 seconds (121.74 k allocations: 6.030 MiB)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
88.0ₜ

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.052982 seconds (121.74 k allocations: 6.030 MiB)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
88.0ₜ

The result now is stable and the macro @time works properly.

I also removed the unused files .travis.yml and Artifacts.toml, as mentioned in #12.

@GiggleLiu
Copy link
Member

Please try this test case, I am afraid it is fully fixed:

using GenericTensorNetworks, GenericTensorNetworks.Graphs
using CUDA
using Random; Random.seed!(6)
g = Graphs.random_regular_graph(200, 3)
item(x::AbstractArray) = Array(x)[]
optimizer = TreeSA(ntrials=1)
gp = IndependentSet(g; optimizer=optimizer)
contraction_complexity(gp)
@time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
using CuTropicalGEMM
@time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)

@ArrogantGao
Copy link
Collaborator Author

Please try this test case, I am afraid it is fully fixed:

using GenericTensorNetworks, GenericTensorNetworks.Graphs
using CUDA
using Random; Random.seed!(6)
g = Graphs.random_regular_graph(200, 3)
item(x::AbstractArray) = Array(x)[]
optimizer = TreeSA(ntrials=1)
gp = IndependentSet(g; optimizer=optimizer)
contraction_complexity(gp)
@time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
using CuTropicalGEMM
@time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
julia> using GenericTensorNetworks, GenericTensorNetworks.Graphs

julia> using CUDA

julia> using Random; Random.seed!(6)
TaskLocalRNG()

julia> g = Graphs.random_regular_graph(200, 3)
{200, 300} undirected simple Int64 graph

julia> item(x::AbstractArray) = Array(x)[]
item (generic function with 1 method)

julia> optimizer = TreeSA(ntrials=1)
TreeSA{Int64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, GreedyMethod{OMEinsumContractionOrders.MinSpaceOut}, Any}(20, 0.01:0.05:14.96, 1, 50, 1.0, 0.2, :greedy, 0, Any[], GreedyMethod{OMEinsumContractionOrders.MinSpaceOut}(OMEinsumContractionOrders.MinSpaceOut(), 1))

julia> gp = IndependentSet(g; optimizer=optimizer)
┌ Warning: target space complexity not found, got: 24.0, with time complexity 31.121747025566243, read-write complexity 26.325278911340753.
└ @ OMEinsumContractionOrders ~/.julia/packages/OMEinsumContractionOrders/WpwIz/src/treesa.jl:229
IndependentSet{OMEinsum.SlicedEinsum{Int64, OMEinsum.DynamicNestedEinsum{Int64}}, NoWeight}(OMEinsum.SlicedEinsum{Int64, OMEinsum.DynamicNestedEinsum{Int64}}(Int64[], 86, 86 -> 
├─ 86∘192, 192∘86 -> 86
│  ├─ 86∘192
│  └─ 158∘49∘192, 86∘158∘49 -> 192∘86
│     ├─ 192∘158, 192∘49 -> 158∘49∘192
│     │  ├─ 158∘192, 158 -> 192∘158
│     │  │  ├─ 158∘192, 192 -> 158∘192
│     │  │  │  ⋮
│     │  │  │  
│     │  │  └─ 158
│     │  └─ 49∘192, 49 -> 192∘49
│     │     ├─ 49∘192
│     │     └─ 49
│     └─ 157∘86∘158∘106, 157∘106∘49 -> 86∘158∘49
│        ├─ 158∘106∘151∘157∘199∘125∘86∘21∘74∘138∘68∘30∘60∘58∘57, 138∘60∘125∘30∘58∘68∘74∘151∘158∘57∘199∘21∘106 -> 157∘86∘158∘106
│        │  ├─ 102∘57∘44∘158∘47, 106∘47∘151∘157∘102∘199∘125∘86∘57∘21∘74∘138∘68∘30∘60∘58∘44 -> 158∘106∘151∘157∘199∘125∘86∘21∘74∘138∘68∘30∘60∘58∘57
│        │  │  ⋮
│        │  │  
│        │  └─ 138∘60∘125∘30∘58∘68∘74∘151∘158∘57∘35∘199, 21∘106∘35 -> 138∘60∘125∘30∘58∘68∘74∘151∘158∘57∘199∘21∘106
│        │     ⋮
│        │     
│        └─ 49∘157, 49∘106 -> 157∘106∘49
│           ├─ 49∘157
│           └─ 49∘106
└─ 86
), SimpleGraph{Int64}(300, [[44, 158, 182], [66, 126, 167], [67, 85, 113], [10, 76, 105], [145, 148, 171], [56, 104, 180], [22, 41, 72], [149, 166, 173], [72, 90, 174], [4, 69, 129]  …  [55, 63, 137], [49, 86, 158], [15, 127, 176], [36, 102, 176], [120, 134, 178], [32, 94, 118], [30, 96, 113], [28, 81, 165], [45, 98, 189], [71, 75, 108]]), NoWeight(), Dict{Int64, Int64}())

julia> contraction_complexity(gp)
Time complexity: 2^31.121747025566243
Space complexity: 2^24.0
Read-write complexity: 2^26.325278911340753

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
 36.767120 seconds (65.42 M allocations: 4.263 GiB, 4.67% gc time, 0.13% compilation time)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> using CuTropicalGEMM
[ Info: Precompiling CuTropicalGEMM [c2b282c3-c9c2-431d-80f7-a1a0561ebe55]

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.470510 seconds (526.07 k allocations: 33.198 MiB, 6.24% gc time)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.042993 seconds (115.21 k allocations: 5.698 MiB)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

julia> @time CUDA.@sync solve(gp, SizeMax(); usecuda=true, T=Float32)
  0.065605 seconds (115.21 k allocations: 5.698 MiB)
0-dimensional CuArray{Tropical{Float32}, 0, CUDA.Mem.DeviceBuffer}:
89.0ₜ

The test passed, and I also double checked the result on another server.

Did you rebuild the binary after pull?

Copy link
Member

@GiggleLiu GiggleLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just verified the correctness. Good job! After fixing the Project.toml and CI, we can release v0.1.

Project.toml Outdated
@@ -6,6 +6,7 @@ version = "1.0.0-DEV"
[deps]
ArtifactUtils = "8b73e784-e7d8-4ea5-973d-377fed4e3bce"
Artifacts = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all dependencies that not directly used in src from the Project.toml
If you want a test environment, please add a Project.toml file to the test folder, example: https://github.com/TensorBFS/TensorInference.jl/tree/main/test
TestEnv.jl can help you start a test environment for debugging easily.

For packages like BenchmarkTools, they should not be included in the local environment.

@GiggleLiu
Copy link
Member

GiggleLiu commented Sep 28, 2023

We need to get this PR merged before closing issue #10

Approve means you get the permission to merge this PR directly.

@ArrogantGao
Copy link
Collaborator Author

We need to get this PR merged before closing issue #10

Ops, sorry about that, I reopened the the issue #10.

@ArrogantGao ArrogantGao merged commit fe574dd into main Sep 28, 2023
@GiggleLiu GiggleLiu mentioned this pull request Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants