-
-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make gpu(::DataLoader)
work or error loudly if it doesn't
#1974
Comments
Is |
I don't think |
right, suggesting moving the data prior to constructing the DataLoader should allow for expression of intent. Whether they want to move the whole data set or not. Beyond that, allowing DataLoader to accept |
When I created this post I knew that my data would fit in the GPU memory (because it was not a very big dataset), but I wasn't really understanding why such data was not being loaded into the GPU. I thought that if the whole IMO loading the whole object with the correspondent data to the GPU with The real issue though was that I couldn't find any good examples of GPU usage in Flux until I decided to dig into the model zoo for more practical usage scenarios. It would have been easier to solve this issue if the "GPU Support" page in the docs presented some workflows similar to the ones in the model zoo as for how to transfer artifacts between CPU and GPU. Common operations such as transferring data and models exemplified and explained would be extremely helpful! Some of the examples could include, for instance:
train_loader = Flux.DataLoader((xtrain, ytrain), batchsize = 64, shuffle = true)
# ... model, optimizer and loss definitions
for epoch in 1:nepochs
for (xtrain_batch, ytrain_batch) in train_loader
x = gpu(xtrain_batch)
y = gpu(ytrain_batch)
gradients = gradient(() -> loss(x, y), parameters)
Flux.Optimise.update!(optimizer, parameters, gradients)
end
end
let model = cpu(model)
BSON.@save modelpath model epoch
end |
I think a big argument for not exposing But as you pointed out, the first-order issue is a lack of good docs. If you have a rough idea of how you'd like them to look, please do drop a PR and we'll make sure it gets in. One reason I think the status quo has persisted for so long is that the most efficient method we have for GPU data movement is https://cuda.juliagpu.org/stable/usage/memory/#Batching-iterator, but it's not quite general enough for all datasets people want to use. However, that can be remedied by just having slightly longer documentation which presents multiple alternatives. |
Not that we should do it but you're asking for ***@***.*** DataLoader
(data,)`.
…On Sat, May 28, 2022, 23:07 Brian Chen ***@***.***> wrote:
I think a big argument for not exposing gpu(::DataLoader) is that you
can't tell whether or not it worked. gpu is necessarily a no-op for any
types it doesn't understand, and the internals of DataLoader are
completely opaque to the user. Therefore, it's much easier to ask that data
be moved before it is passed to or after it is pulled from the data loader.
If I had to guess, this is also part of why PyTorch exposes .to(device)
on Datasets but not DataLoaders.
But as you pointed out, the first-order issue is a lack of good docs. If
you have a rough idea of how you'd like them to look, please do drop a PR
and we'll make sure it gets in. One reason I think the status quo has
persisted for so long is that the most efficient method we have for GPU
data movement is
https://cuda.juliagpu.org/stable/usage/memory/#Batching-iterator, but
it's not quite general enough for all datasets people want to use. However,
that can be remedied by just having slightly longer documentation which
presents multiple alternatives.
—
Reply to this email directly, view it on GitHub
<#1974 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJOZVVNQG4WFFBBUXG4FLFLVMJKVDANCNFSM5XDFBHCQ>
.
You are receiving this because you commented.Message ID: <FluxML/Flux.
***@***.***>
|
This has come up a couple times, most recently in https://discourse.julialang.org/t/training-with-flux-jl-on-the-gpu-causes-argumenterror-cannot-take-the-cpu-address-of-a-cuarray/81751/2. RFC on what the desired approach is.
The text was updated successfully, but these errors were encountered: