CUDA 2nd order AD with MaxPool and logsoftmax #1150

avik-pal · 2024-12-31T16:18:12Z

using Lux, CUDA, cuDNN, Random, OneHotArrays, Zygote
using Functors, Optimisers, Printf

model = Chain(
    Conv((5, 5), 1 => 6, relu),
    MaxPool((2, 2)),
    Conv((5, 5), 6 => 16, relu),
    MaxPool((2, 2)),
    FlattenLayer(3),
    Chain(
        Dense(256 => 128, relu),
        Dense(128 => 84, relu),
        Dense(84 => 2)
    )
)

dev = gpu_device(; force=true)

ps, st = Lux.setup(Random.default_rng(), model) |> dev;

x = randn(Float32, 28,28,1,32) |> dev
δ = randn(Float32, 28,28,1,32) |> dev
y =  onehotbatch(rand((1,2),32), 1:2) |> dev

const celoss = CrossEntropyLoss(;logits=true)
const regloss = MSELoss()

function loss_function(model, ps, st, x, y)
    pred, _ = model(x, ps, st)
    return celoss(pred, y)
end

function ∂xloss_function(model, ps, st, x, δ, y)
    smodel = StatefulLuxLayer{true}(model, ps, st)
    ∂x = only(Zygote.gradient(Base.Fix2(celoss, y) ∘ smodel, x))
    regloss(∂x, δ) + loss_function(model, ps, st, x, y)
end

function ∂∂xloss_function(model, ps, st, x, δ, y)
    only(Zygote.gradient(ps -> ∂xloss_function(model, ps, st, x, δ, y), ps))
end

∂∂xloss_function(model, ps, st, x, δ, y)

Mostly a dup of #1007, but since this is needed I will prioritize implementing this.

cc @pevnak so that you are in the loop

avik-pal added bug Something isn't working nested-ad labels Dec 31, 2024

avik-pal self-assigned this Dec 31, 2024

avik-pal mentioned this issue Dec 31, 2024

feat: more nested AD rules #1151

Merged

6 tasks

avik-pal closed this as completed in #1151 Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA 2nd order AD with MaxPool and logsoftmax #1150

CUDA 2nd order AD with MaxPool and logsoftmax #1150

avik-pal commented Dec 31, 2024

CUDA 2nd order AD with MaxPool and logsoftmax #1150

CUDA 2nd order AD with MaxPool and logsoftmax #1150

Comments

avik-pal commented Dec 31, 2024