Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DimensionMismatch("array could not be broadcast to match destination") #1457

Closed
krishvishal opened this issue Jan 8, 2021 · 8 comments
Closed

Comments

@krishvishal
Copy link

I'm trying to make a neuralode using conv layers. After I build the model, the forward pass works fine but when I try to get the gradient using g = gradient(() -> loss(x, y), params(model)) , I get a DimensionMismatch("array could not be broadcast to match destination") .

To reproduce the error:

using DiffEqFlux, OrdinaryDiffEq, Flux, NNlib, MLDataUtils, Printf
using Flux: logitcrossentropy
using Flux.Data: DataLoader
using MLDatasets
using CUDA
using Random: seed!
CUDA.allowscalar(false)

function loadmnist(batchsize = bs, train_split = 0.9)
    # Use MLDataUtils LabelEnc for natural onehot conversion
    onehot(labels_raw) = convertlabel(LabelEnc.OneOfK, labels_raw,
                                      LabelEnc.NativeLabels(collect(0:9)))
    # Load MNIST
    imgs, labels_raw = MNIST.traindata();
    # Process images into (H,W,C,BS) batches
    x_data = Float32.(reshape(imgs, size(imgs,1), size(imgs,2), 1, size(imgs,3)))
    y_data = onehot(labels_raw)
    (x_train, y_train), (x_test, y_test) = stratifiedobs((x_data, y_data),
                                                         p = train_split)
    return (
        # Use Flux's DataLoader to automatically minibatch and shuffle the data
        DataLoader(gpu.(collect.((x_train, y_train))); batchsize = batchsize,
                   shuffle = true),
        # Don't shuffle the test data
        DataLoader(gpu.(collect.((x_test, y_test))); batchsize = batchsize,
                   shuffle = false)
    )
end

# Main
const bs = 128
const train_split = 0.9
train_dataloader, test_dataloader = loadmnist(bs, train_split);

function DiffEqArray_to_Array(x)
    xarr = gpu(x)
    return reshape(xarr, size(xarr)[1:end-1])
end

c1 = Chain(Conv((3,3), 1=>64, pad=(0,0), relu)) |> gpu
convode_base= Chain(Conv((3,3), 64=>64, stride=1, pad=1, relu),
            BatchNorm(64)) |> gpu
convode = NeuralODE(convode_base, (0.f0, 1.f0), Tsit5(),
           save_everystep = false,
           reltol = 1e-3, abstol = 1e-3,
           save_start = false) |> gpu;
fc = Chain(Dense(43264,10)) |> gpu;

# Model
model = Chain(c1,
        convode,
        DiffEqArray_to_Array,
        flatten,
        fc) |> gpu;

loss(x, y) = logitcrossentropy(model(x), y)

loss(img, lab)

g = gradient(() -> loss(img, lab), params(model))

Running the above code results in DimensionMismatch error.

Full stack trace: https://pastebin.com/P0iV2ihP

Further investigation:

  1. Removing BatchNorm from convode_base gets rid of the error.
  2. Using GroupNorm in convode_base also results in the same error.
  3. Took pullback of convode with the output of c1(img). This works.

Code for pullback:

x1 = rand(size(c1(img)));
x2 = rand((64,1));
a = deepcopy(c1(img));
using Zygote
Zygote.pullback(convode, a)

Pullback with x1 and x2 doesn't work.

I was not able to narrow down what is causing the original error.

@ToucheSir
Copy link
Member

I presume this is an adaptation of the code at SciML/DiffEqFlux.jl#387? The pullback code likely works because it's not actually differentiating through the loss function. This would be the proper equivalent:

l, back = Zygote.pullback(() -> loss(img, lab))
back(one(l))

I would try running this model on CPU first and verifying there is no dimension mismatch there.

@DhairyaLGandhi
Copy link
Member

Likely not Flux related, since BatchNorm and friends shouldn't be changing the output dimensions. Could you test with checking the input and output dims?

@CarloLucibello
Copy link
Member

CarloLucibello commented Jan 8, 2021

You did a reshape that seems really wrong.

return reshape(xarr, size(xarr)[1:end-1])

You should preserve the total length of an array

@denglerchr
Copy link

I receive the same error without using a reshape function. Funny enough the error only happens for the recurrent layers GRU or RNN, but not for LSTM. Maybe its a different bug from the above, not sure, atleast the error message is the same.
To reproduce:

using Flux, Statistics

# some setting
nT = 100
ndata = 20
batchsize = 5
ninputs = 3
noutputs = 1

# create artificial data
struct SeqData
    x::AbstractVector 
    y::AbstractVector
end

data = Vector{SeqData}(undef, 0)
for i = 1:ndata
    input = [randn(Float32, ninputs, batchsize) for i = 1:nT]
    output = [randn(Float32, noutputs, batchsize) for i = 1:nT]
    push!(data, SeqData(input, output) )
end
train_loader = Flux.Data.DataLoader(data)

# Create a model
model = Chain(GRU(ninputs, ninputs), Dense(ninputs, noutputs)) # broken for GRU and RNN, works for LSTM

# Loss function
function loss(x, y)
    Flux.reset!(model)
    y_model = model.(x)
    diff = [mean(abs2, y[i] .- y_model[i]) for i = 1:length(y) ]
    return mean(diff)
end
loss(data::SeqData) = loss(data.x, data.y)
loss(data::Vector{SeqData}) = mean( loss(seq) for seq in data )

# Evaluate the loss and try training the model
loss(data) # This works for all types of rnn
Flux.train!(loss, params(model), train_loader, ADAM()) # This does not work for GRU and RNN

I used Flux version 0.11.3 on Windows with Julia 1.5.3

@denglerchr
Copy link

I should probably provide the full error message for my code above:

ERROR: LoadError: DimensionMismatch("cannot broadcast array to have fewer dimensions")
Stacktrace:
 [1] check_broadcast_shape(::Tuple{}, ::Tuple{Base.OneTo{Int64}}) at .\broadcast.jl:518
 [2] check_broadcast_shape(::Tuple{Base.OneTo{Int64}}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at .\broadcast.jl:521
 [3] check_broadcast_axes at .\broadcast.jl:523 [inlined]
 [4] check_broadcast_axes at .\broadcast.jl:527 [inlined]
 [5] instantiate at .\broadcast.jl:269 [inlined]
 [6] materialize! at .\broadcast.jl:848 [inlined]
 [7] materialize!(::Array{Float32,1}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2},Nothing,typeof(+),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(*),Tuple{Float64,Array{Float32,1}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2},Nothing,typeof(*),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(-),Tuple{Int64,Float64}},Array{Float32,2}}}}}) at .\broadcast.jl:845
 [8] apply!(::ADAM, ::Array{Float32,1}, ::Array{Float32,2}) at C:\Users\christian.dengler\.julia\packages\Flux\sY3yx\src\optimise\optimisers.jl:175
 [9] update!(::ADAM, ::Array{Float32,1}, ::Array{Float32,2}) at C:\Users\christian.dengler\.julia\packages\Flux\sY3yx\src\optimise\train.jl:23
 [10] update!(::ADAM, ::Zygote.Params, ::Zygote.Grads) at C:\Users\christian.dengler\.julia\packages\Flux\sY3yx\src\optimise\train.jl:29
 [11] macro expansion at C:\Users\christian.dengler\.julia\packages\Flux\sY3yx\src\optimise\train.jl:105 [inlined]
 [12] macro expansion at C:\Users\christian.dengler\.julia\packages\Juno\n6wyj\src\progress.jl:134 [inlined]
 [13] train!(::Function, ::Zygote.Params, ::Flux.Data.DataLoader{Array{SeqData,1}}, ::ADAM; cb::Flux.Optimise.var"#16#22") at C:\Users\christian.dengler\.julia\packages\Flux\sY3yx\src\optimise\train.jl:100
 [14] train!(::Function, ::Zygote.Params, ::Flux.Data.DataLoader{Array{SeqData,1}}, ::ADAM) at C:\Users\christian.dengler\.julia\packages\Flux\sY3yx\src\optimise\train.jl:98
 [15] top-level scope at d:\User\CDE\Tapping_Pred_Maint\test.jl:39
 [16] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091
 [17] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at .\essentials.jl:710
 [18] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N) at .\essentials.jl:709
 [19] inlineeval(::Module, ::String, ::Int64, ::Int64, ::String; softscope::Bool) at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:185
 [20] (::VSCodeServer.var"#61#65"{String,Int64,Int64,String,Module,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:144
 [21] withpath(::VSCodeServer.var"#61#65"{String,Int64,Int64,String,Module,Bool,VSCodeServer.ReplRunCodeRequestParams}, ::String) at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\repl.jl:124
 [22] (::VSCodeServer.var"#60#64"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:142
 [23] hideprompt(::VSCodeServer.var"#60#64"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams}) at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\repl.jl:36
 [24] (::VSCodeServer.var"#59#63"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:110
 [25] with_logstate(::Function, ::Any) at .\logging.jl:408
 [26] with_logger at .\logging.jl:514 [inlined]
 [27] (::VSCodeServer.var"#58#62"{VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:109
 [28] #invokelatest#1 at .\essentials.jl:710 [inlined]
 [29] invokelatest(::Any) at .\essentials.jl:709
 [30] macro expansion at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:27 [inlined]
 [31] (::VSCodeServer.var"#56#57")() at .\task.jl:356

@DhairyaLGandhi
Copy link
Member

As @CarloLucibello pointed, layers in flux expect the last dim to be the batch, and the reshape above seems to drop that. Also note that normalisation on batch size of 1 not meaningful

@schlichtanders
Copy link

I am also running in this issue with Optim BFGS and Optim LBFGS

I think this issue is related https://discourse.julialang.org/t/optimization-with-lbfgs-gives-dimensionmismatch-dimensions-must-match/22167

@ToucheSir
Copy link
Member

The DimensionMismatch error could come from a great many places and Optim is not a FluxML package, so perhaps it would be better to seek help there? If things are only reproducible with Flux + Optim, then a separate issue + MWE would be very much appreciated.

In the meantime, I think this thread is safe to close because both the original and follow-up example have answers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants