-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow interaction with DataLoader #141
Comments
using Flux
using GraphNeuralNetworks
using BenchmarkTools
f(x) = 1
function test(g)
loader = Flux.DataLoader(g, batchsize=100, shuffle=true)
s = 0
for d in loader
s += f(d)
end
return s
end
n = 5000
s = 10
x1 = Flux.batch([rand_graph(s, s, ndata = rand(1, s)) for i in 1:n])
x2 = Flux.batch([rand(s + s + s + s) for i in 1:n]) #source+target+data+extra
@btime test(x1); # 1.296 s (2502 allocations: 6.17 MiB)
@btime test(x2); # 400.778 μs (152 allocations: 1.61 MiB)
|
A way around this is to store in the graph another vector of length |
Thanks to #143 the recommended way to interact with the DataLoader is now data = [rand_graph(10, 20, ndata=rand(Float32, 2, 10)) for _ in 1:1000]
train_loader = DataLoader(data, batchsize=10)
for g in train_loader
# ...
end @casper2002casper is this fast enough for your usecase? |
Thank you! I will let you know as soon as I have acces to my pc again in a couple days. |
This has sped up my training 20 times, very much appreciated |
When using Flux.DataLoader loading graph batches is slower than expected. It really slows everything down when dealing with large training sets.
A comparison to using vectors with the same amount of data as a graph:
The text was updated successfully, but these errors were encountered: