DimensionMismatch("array could not be broadcast to match destination") #1457

krishvishal · 2021-01-08T00:43:45Z

I'm trying to make a neuralode using conv layers. After I build the model, the forward pass works fine but when I try to get the gradient using g = gradient(() -> loss(x, y), params(model)) , I get a DimensionMismatch("array could not be broadcast to match destination") .

To reproduce the error:

using DiffEqFlux, OrdinaryDiffEq, Flux, NNlib, MLDataUtils, Printf
using Flux: logitcrossentropy
using Flux.Data: DataLoader
using MLDatasets
using CUDA
using Random: seed!
CUDA.allowscalar(false)

function loadmnist(batchsize = bs, train_split = 0.9)
    # Use MLDataUtils LabelEnc for natural onehot conversion
    onehot(labels_raw) = convertlabel(LabelEnc.OneOfK, labels_raw,
                                      LabelEnc.NativeLabels(collect(0:9)))
    # Load MNIST
    imgs, labels_raw = MNIST.traindata();
    # Process images into (H,W,C,BS) batches
    x_data = Float32.(reshape(imgs, size(imgs,1), size(imgs,2), 1, size(imgs,3)))
    y_data = onehot(labels_raw)
    (x_train, y_train), (x_test, y_test) = stratifiedobs((x_data, y_data),
                                                         p = train_split)
    return (
        # Use Flux's DataLoader to automatically minibatch and shuffle the data
        DataLoader(gpu.(collect.((x_train, y_train))); batchsize = batchsize,
                   shuffle = true),
        # Don't shuffle the test data
        DataLoader(gpu.(collect.((x_test, y_test))); batchsize = batchsize,
                   shuffle = false)
    )
end

# Main
const bs = 128
const train_split = 0.9
train_dataloader, test_dataloader = loadmnist(bs, train_split);

function DiffEqArray_to_Array(x)
    xarr = gpu(x)
    return reshape(xarr, size(xarr)[1:end-1])
end

c1 = Chain(Conv((3,3), 1=>64, pad=(0,0), relu)) |> gpu
convode_base= Chain(Conv((3,3), 64=>64, stride=1, pad=1, relu),
            BatchNorm(64)) |> gpu
convode = NeuralODE(convode_base, (0.f0, 1.f0), Tsit5(),
           save_everystep = false,
           reltol = 1e-3, abstol = 1e-3,
           save_start = false) |> gpu;
fc = Chain(Dense(43264,10)) |> gpu;

# Model
model = Chain(c1,
        convode,
        DiffEqArray_to_Array,
        flatten,
        fc) |> gpu;

loss(x, y) = logitcrossentropy(model(x), y)

loss(img, lab)

g = gradient(() -> loss(img, lab), params(model))

Running the above code results in DimensionMismatch error.

Full stack trace: https://pastebin.com/P0iV2ihP

Further investigation:

Removing BatchNorm from convode_base gets rid of the error.
Using GroupNorm in convode_base also results in the same error.
Took pullback of convode with the output of c1(img). This works.

Code for pullback:

x1 = rand(size(c1(img)));
x2 = rand((64,1));
a = deepcopy(c1(img));
using Zygote
Zygote.pullback(convode, a)

Pullback with x1 and x2 doesn't work.

I was not able to narrow down what is causing the original error.

The text was updated successfully, but these errors were encountered:

ToucheSir · 2021-01-08T01:22:55Z

I presume this is an adaptation of the code at SciML/DiffEqFlux.jl#387? The pullback code likely works because it's not actually differentiating through the loss function. This would be the proper equivalent:

l, back = Zygote.pullback(() -> loss(img, lab))
back(one(l))

I would try running this model on CPU first and verifying there is no dimension mismatch there.

DhairyaLGandhi · 2021-01-08T06:59:37Z

Likely not Flux related, since BatchNorm and friends shouldn't be changing the output dimensions. Could you test with checking the input and output dims?

CarloLucibello · 2021-01-08T07:41:21Z

You did a reshape that seems really wrong.

return reshape(xarr, size(xarr)[1:end-1])

You should preserve the total length of an array

denglerchr · 2021-01-08T10:52:59Z

I receive the same error without using a reshape function. Funny enough the error only happens for the recurrent layers GRU or RNN, but not for LSTM. Maybe its a different bug from the above, not sure, atleast the error message is the same.
To reproduce:

using Flux, Statistics

# some setting
nT = 100
ndata = 20
batchsize = 5
ninputs = 3
noutputs = 1

# create artificial data
struct SeqData
    x::AbstractVector 
    y::AbstractVector
end

data = Vector{SeqData}(undef, 0)
for i = 1:ndata
    input = [randn(Float32, ninputs, batchsize) for i = 1:nT]
    output = [randn(Float32, noutputs, batchsize) for i = 1:nT]
    push!(data, SeqData(input, output) )
end
train_loader = Flux.Data.DataLoader(data)

# Create a model
model = Chain(GRU(ninputs, ninputs), Dense(ninputs, noutputs)) # broken for GRU and RNN, works for LSTM

# Loss function
function loss(x, y)
    Flux.reset!(model)
    y_model = model.(x)
    diff = [mean(abs2, y[i] .- y_model[i]) for i = 1:length(y) ]
    return mean(diff)
end
loss(data::SeqData) = loss(data.x, data.y)
loss(data::Vector{SeqData}) = mean( loss(seq) for seq in data )

# Evaluate the loss and try training the model
loss(data) # This works for all types of rnn
Flux.train!(loss, params(model), train_loader, ADAM()) # This does not work for GRU and RNN

I used Flux version 0.11.3 on Windows with Julia 1.5.3

denglerchr · 2021-01-08T10:54:24Z

I should probably provide the full error message for my code above:

ERROR: LoadError: DimensionMismatch("cannot broadcast array to have fewer dimensions")
Stacktrace:
 [1] check_broadcast_shape(::Tuple{}, ::Tuple{Base.OneTo{Int64}}) at .\broadcast.jl:518
 [2] check_broadcast_shape(::Tuple{Base.OneTo{Int64}}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at .\broadcast.jl:521
 [3] check_broadcast_axes at .\broadcast.jl:523 [inlined]
 [4] check_broadcast_axes at .\broadcast.jl:527 [inlined]
 [5] instantiate at .\broadcast.jl:269 [inlined]
 [6] materialize! at .\broadcast.jl:848 [inlined]
 [7] materialize!(::Array{Float32,1}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2},Nothing,typeof(+),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(*),Tuple{Float64,Array{Float32,1}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2},Nothing,typeof(*),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(-),Tuple{Int64,Float64}},Array{Float32,2}}}}}) at .\broadcast.jl:845
 [8] apply!(::ADAM, ::Array{Float32,1}, ::Array{Float32,2}) at C:\Users\christian.dengler\.julia\packages\Flux\sY3yx\src\optimise\optimisers.jl:175
 [9] update!(::ADAM, ::Array{Float32,1}, ::Array{Float32,2}) at C:\Users\christian.dengler\.julia\packages\Flux\sY3yx\src\optimise\train.jl:23
 [10] update!(::ADAM, ::Zygote.Params, ::Zygote.Grads) at C:\Users\christian.dengler\.julia\packages\Flux\sY3yx\src\optimise\train.jl:29
 [11] macro expansion at C:\Users\christian.dengler\.julia\packages\Flux\sY3yx\src\optimise\train.jl:105 [inlined]
 [12] macro expansion at C:\Users\christian.dengler\.julia\packages\Juno\n6wyj\src\progress.jl:134 [inlined]
 [13] train!(::Function, ::Zygote.Params, ::Flux.Data.DataLoader{Array{SeqData,1}}, ::ADAM; cb::Flux.Optimise.var"#16#22") at C:\Users\christian.dengler\.julia\packages\Flux\sY3yx\src\optimise\train.jl:100
 [14] train!(::Function, ::Zygote.Params, ::Flux.Data.DataLoader{Array{SeqData,1}}, ::ADAM) at C:\Users\christian.dengler\.julia\packages\Flux\sY3yx\src\optimise\train.jl:98
 [15] top-level scope at d:\User\CDE\Tapping_Pred_Maint\test.jl:39
 [16] include_string(::Function, ::Module, ::String, ::String) at .\loading.jl:1091
 [17] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at .\essentials.jl:710
 [18] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N) at .\essentials.jl:709
 [19] inlineeval(::Module, ::String, ::Int64, ::Int64, ::String; softscope::Bool) at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:185
 [20] (::VSCodeServer.var"#61#65"{String,Int64,Int64,String,Module,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:144
 [21] withpath(::VSCodeServer.var"#61#65"{String,Int64,Int64,String,Module,Bool,VSCodeServer.ReplRunCodeRequestParams}, ::String) at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\repl.jl:124
 [22] (::VSCodeServer.var"#60#64"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:142
 [23] hideprompt(::VSCodeServer.var"#60#64"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams}) at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\repl.jl:36
 [24] (::VSCodeServer.var"#59#63"{String,Int64,Int64,String,Module,Bool,Bool,VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:110
 [25] with_logstate(::Function, ::Any) at .\logging.jl:408
 [26] with_logger at .\logging.jl:514 [inlined]
 [27] (::VSCodeServer.var"#58#62"{VSCodeServer.ReplRunCodeRequestParams})() at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:109
 [28] #invokelatest#1 at .\essentials.jl:710 [inlined]
 [29] invokelatest(::Any) at .\essentials.jl:709
 [30] macro expansion at c:\Users\christian.dengler\.vscode\extensions\julialang.language-julia-1.0.10\scripts\packages\VSCodeServer\src\eval.jl:27 [inlined]
 [31] (::VSCodeServer.var"#56#57")() at .\task.jl:356

DhairyaLGandhi · 2021-01-08T12:22:34Z

As @CarloLucibello pointed, layers in flux expect the last dim to be the batch, and the reshape above seems to drop that. Also note that normalisation on batch size of 1 not meaningful

schlichtanders · 2022-09-18T08:11:49Z

I am also running in this issue with Optim BFGS and Optim LBFGS

I think this issue is related https://discourse.julialang.org/t/optimization-with-lbfgs-gives-dimensionmismatch-dimensions-must-match/22167

ToucheSir · 2022-09-20T23:23:35Z

The DimensionMismatch error could come from a great many places and Optim is not a FluxML package, so perhaps it would be better to seek help there? If things are only reproducible with Flux + Optim, then a separate issue + MWE would be very much appreciated.

In the meantime, I think this thread is safe to close because both the original and follow-up example have answers.

ToucheSir closed this as completed Sep 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DimensionMismatch("array could not be broadcast to match destination") #1457

DimensionMismatch("array could not be broadcast to match destination") #1457

krishvishal commented Jan 8, 2021

ToucheSir commented Jan 8, 2021

DhairyaLGandhi commented Jan 8, 2021

CarloLucibello commented Jan 8, 2021 •

edited

Loading

denglerchr commented Jan 8, 2021

denglerchr commented Jan 8, 2021

DhairyaLGandhi commented Jan 8, 2021

schlichtanders commented Sep 18, 2022

ToucheSir commented Sep 20, 2022

DimensionMismatch("array could not be broadcast to match destination") #1457

DimensionMismatch("array could not be broadcast to match destination") #1457

Comments

krishvishal commented Jan 8, 2021

ToucheSir commented Jan 8, 2021

DhairyaLGandhi commented Jan 8, 2021

CarloLucibello commented Jan 8, 2021 • edited Loading

denglerchr commented Jan 8, 2021

denglerchr commented Jan 8, 2021

DhairyaLGandhi commented Jan 8, 2021

schlichtanders commented Sep 18, 2022

ToucheSir commented Sep 20, 2022

CarloLucibello commented Jan 8, 2021 •

edited

Loading