Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reverse lookup (Ptr->Tuple) for CUDNN descriptors. #1948

Merged
merged 4 commits into from
Aug 19, 2023

Conversation

RomeoV
Copy link
Contributor

@RomeoV RomeoV commented Jun 11, 2023

This fixes #1947.

@RomeoV
Copy link
Contributor Author

RomeoV commented Jun 11, 2023

Here's a small script that shows the loading and saving

import Pkg;
Pkg.activate("@CUDAPerfCachingTest")
# Setup:
# I don't think you can link to the cuDNN.jl module within CUDA.jl directly, so
# you'll have to clone github.com/romeov/CUDA.jl and then link
# Pkg.develop(path="<local>/romeov/CUDA.jl/lib/cudnn")
# Also
# Pkg.add("Flux")
# Pkg.add("JLD2")
#
# Execute e.g. with `julia caching_test.jl save` or `julia caching_test.jl load` or just `julia caching_test.jl`
using Flux, JLD2
import Flux.Zygote: gradient

function load_conv_caches!(; cudnn_mod::Module=Flux.cuDNN, filename="/tmp/conv_cache.jld2")
    @info "Loading conv_cache."
    conv_data_cache = JLD2.load(filename, "conv_data_cache");
    push!(cudnn_mod.cudnnConvolutionBwdDataAlgoPerfCache,
          conv_data_cache...)
    conv_filter_cache = JLD2.load(filename, "conv_filter_cache");
    push!(cudnn_mod.cudnnConvolutionBwdFilterAlgoPerfCache,
          conv_filter_cache...)
end

function save_conv_caches(; cudnn_mod::Module=Flux.cuDNN, filename="/tmp/conv_cache.jld2")
    @info "Storing conv_cache."
    JLD2.save(filename,
              "conv_data_cache", cudnn_mod.cudnnConvolutionBwdDataAlgoPerfCache,
              "conv_filter_cache", cudnn_mod.cudnnConvolutionBwdFilterAlgoPerfCache,
    )
end



if "load" in ARGS
    load_conv_caches!()
end

model = Chain(Conv((3, 3), 3=>64, relu; pad=SamePad()),
              Conv((3, 3), 64=>32, relu),
              GlobalMeanPool(),
              Flux.flatten,
              Dense(32=>1))

x = rand(Float32, 32, 32, 3, 7);

let x = gpu(x),
    model = gpu(model),
    ps = Flux.params(model)

  t0 = time()
  gradient(ps) do
      model(x) |> sum
  end
  @info "done in $(time() - t0) seconds :)"
end;
("load" in ARGS) && @show length(Flux.cuDNN.cudnnConvolutionBwdDataAlgoPerfCache)

if "save" in ARGS
    save_conv_caches()
end

Comment on lines 172 to 176
# Helper fct to recover cudnn descriptor tuples from cudnn descriptor pointers
# so that we can cache algorithms based on data descriptors.
# Actually just reverses the cache dict and returns the descriptor as a tuple.
map_cudnn_ptr_to_jl_tuple(cache_dict, desc_ptr) = Dict(zip(values(cache_dict),
keys(cache_dict)))[desc_ptr]
Copy link
Contributor

@ToucheSir ToucheSir Jun 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of recreating the cache in reversed form and searching it every time (expensive!), CUDA.jl provides functions for pulling out the info from a descriptor (cheap!). See

function cudnnGetTensorDescriptor(d::cudnnTensorDescriptor)
nbDimsRequested = CUDNN_DIM_MAX
dataType = Ref{cudnnDataType_t}(CUDNN_DATA_FLOAT)
nbDims = Ref{Cint}(0)
dimA = Array{Cint}(undef, CUDNN_DIM_MAX)
strideA = Array{Cint}(undef, CUDNN_DIM_MAX)
cudnnGetTensorNdDescriptor(d, nbDimsRequested, dataType, nbDims, dimA, strideA)
T = juliaDataType(dataType[])
D = (dimA[nbDims[]:-1:1]...,)
S = (strideA[nbDims[]:-1:1]...,)
return T,D,S
end
function cudnnGetFilterDescriptor(d::cudnnFilterDescriptor)
nbDimsRequested = CUDNN_DIM_MAX
dataType = Ref{cudnnDataType_t}(CUDNN_DATA_FLOAT)
format = Ref{cudnnTensorFormat_t}(CUDNN_TENSOR_NCHW)
nbDims = Ref{Cint}(0)
dimA = Array{Cint}(undef, CUDNN_DIM_MAX)
cudnnGetFilterNdDescriptor(d, nbDimsRequested, dataType, format, nbDims, dimA)
T = juliaDataType(dataType[])
D = (dimA[nbDims[]:-1:1]...,)
return T,D,format[]
end
. You'll have to write equivalent functions for some of the conv-specific descriptors, but it should be quite straightforward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The descriptors as they are still have some Cenum types in them, which we could convert to julia Ints or something if we run into serialization trouble.

RomeoV added 2 commits June 12, 2023 02:49
There is already `cudnnGetTensorDescriptor` and
`cudnnGetFilterDescriptor`, so now we have everything to cache algorithm performances.
However, there's still a few `CUDNN_xyz_t` datatypes, which are Cenums.
We could still map those to Julia integers if serialization is difficult otherwise.
@RomeoV RomeoV force-pushed the master branch 2 times, most recently from a765151 to 9da9e11 Compare June 12, 2023 10:02
lib/cudnn/src/libcudnn.jl Outdated Show resolved Hide resolved
@maleadt
Copy link
Member

maleadt commented Aug 18, 2023

This is still marked WIP; anything to do here @RomeoV @ToucheSir?

@maleadt maleadt added cuda libraries Stuff about CUDA library wrappers. enhancement New feature or request labels Aug 18, 2023
@ToucheSir
Copy link
Contributor

From my end no, didn't even notice the PR title still had WIP.

@maleadt maleadt changed the title WIP: Implement reverse lookup (Ptr->Tuple) for cudnn descriptors. Implement reverse lookup (Ptr->Tuple) for CUDNN descriptors. Aug 19, 2023
@maleadt maleadt merged commit 4b87ec0 into JuliaGPU:master Aug 19, 2023
dyDesc_native = cudnnGetTensorDescriptor(dyDesc)
convDesc_native = cudnnGetConvolutionDescriptor(convDesc)

key = (xDesc_native, dyDesc_native, convDesc_native)
val = lock(cudnnConvolutionBwdFilterAlgoPerfCacheLock) do
get(cudnnConvolutionBwdFilterAlgoPerfCache, (xDesc, dyDesc, convDesc), nothing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RomeoV whoops, I think I missed this line. It should be get(cudnnConvolutionBwdFilterAlgoPerfCache, key, nothing), right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. Opened another PR with that one-line change.

RomeoV added a commit to RomeoV/CUDA.jl that referenced this pull request Aug 20, 2023
This is a follow up to JuliaGPU#1948.
RomeoV added a commit to RomeoV/CUDA.jl that referenced this pull request Aug 20, 2023
This is a follow up to JuliaGPU#1948.
@RomeoV RomeoV mentioned this pull request Aug 20, 2023
maleadt pushed a commit that referenced this pull request Aug 21, 2023
This is a follow up to #1948.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda libraries Stuff about CUDA library wrappers. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cuDNN: Store convolution algorithm choice to disk.
3 participants