Why does `DataLoader` not throw an error when fed with a 1D vector for the target? #1599

HTofi · 2021-05-13T10:44:17Z

Suppose you have the following simple example:

# a data set of two training examples [0 ,0] and [1, 1], with respective output targets of 0 and 1: 
train_input = [0 1; 0 1]
train_target = [0, 1]

We generate a machine learning model as follows:

# define the model
model = Chain(Dense(2, 1, σ))

# define parameters and cost function
parameters = params(model)
loss(x, y) = mse(model(x), y)

# set optimization algorithm
η = 0.1
optimizer = Descent(η)

loader = DataLoader((train_input, train_target), batchsize=2)

@epochs 1000 train!(loss, parameters, loader, optimizer)

The code runs without throwing any errors, though the model actually converges to a constant output value and fails to approximate the variation of the data.

If, however, one defines train_target as

train_target = [0 1]

which is a 1*2 matrix, the model behaves as expected, and we get a good result.

The documentation of DataLoader says that train_input and train_target have to be tensors, with the last dimension in each of them to be the observation dimension. Are 1D vectors not included in this definition of "tensors"? If so, why doesn't DataLoader throw an error?

The text was updated successfully, but these errors were encountered:

DhairyaLGandhi · 2021-05-13T11:33:39Z

One thing to see would be the behaviour with reshape(target, length(target), 1). I would want to avoid doing that silently, but also wonder what the correct error to throw would look like. Something that assumes a vector is just one sample doesn't sound great because a model may presumably have a real as an output.

darsnack · 2021-05-13T13:13:10Z

In the original code, the label is 2D vector but the model returns a scalar. I think the "bug" here is with mse which should have thrown an error when you tried to compute the difference between a column vector and row vector. I don't think DataLoader should be making assumptions and altering the data to fit (as @DhairyaLGandhi points out, reshaping into a single vector is not always the right thing).

That being said, it is possible with Flux.outputsize for the DataLoader to do a quick shape check when beginning iteration.

HTofi · 2021-05-13T13:31:33Z

reshape(target, length(target), 1) creates a 2*1 matrix, which causes DataLoader to return an error saying that number of observations in the two arrays input and target is not the same.

I think @darsnack is right about mse. It only returns a good result when given two 1*2 matrices:

julia> mse([0 1], [0 1])
0.0

though when the second arguement is a 1D vector, it computes something else (not sure what it is computing):

julia> mse([0 1], [0, 1])
0.5

Still I think this issue has to be taken care of by DataLoader, same way as it takes care of consistency in number of observations.

ToucheSir · 2021-05-13T16:24:06Z

I think a good example is how the dataloader handles a smaller batch:

julia> loader = DataLoader((train_input, train_target), batchsize=1);

julia> collect(loader)
2-element Vector{Tuple{Matrix{Int64}, Matrix{Int64}}}:
 ([0; 0], [0])
 ([1; 1], [1])

So the consistency in the number of observations is preserved. I would personally be worried if the dataloader is adding extra dimensions to the arrays I give it, as a) reshape may not preserve the original structure and b) is not always appropriate depending on how the loss function is formulated. I do agree that MSE and other loss functions should be better about failing fast given inputs with mismatched dims.

darsnack · 2021-05-13T16:36:09Z

Still I think this issue has to be taken care of by DataLoader, same way as it takes care of consistency in number of observations.

I think we can and should do both. There are two things to address here:

Bugs with MSE (and potentially other losses) silently failing when the dims don't match
DataLoader can add a feature with Flux.outputsize that checks whether the target dims (sans batch) and model output size match

(1) makes (2) redundant for now, but (2) let's us have friendlier errors and catch cases like (1) for future functions/losses.

CarloLucibello · 2021-05-14T12:20:16Z

pytorch restricts losses to have prediction and target of the same shape, which seems safer than what we do now accepting any broadcast compatible pair. I don't think we would lose much in functionality, most losses are trivial for users to reimplement by themselves if needed and we would avoid a lot of people to fall into this trap.
Should we add a check on the shapes for the next breaking release? Maybe just a warning?

CarloLucibello closed this as completed May 14, 2021

CarloLucibello reopened this May 14, 2021

mcabbott mentioned this issue Jun 27, 2021

Add warnings for mismatched sizes in losses #1636

Merged

bors bot closed this as completed in 8162b8a Jul 11, 2021

ToucheSir reopened this Jul 11, 2021

ToucheSir closed this as completed Jul 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does `DataLoader` not throw an error when fed with a 1D vector for the target? #1599

Why does `DataLoader` not throw an error when fed with a 1D vector for the target? #1599

HTofi commented May 13, 2021 •

edited

Loading

DhairyaLGandhi commented May 13, 2021

darsnack commented May 13, 2021 •

edited

Loading

HTofi commented May 13, 2021 •

edited

Loading

ToucheSir commented May 13, 2021

darsnack commented May 13, 2021

CarloLucibello commented May 14, 2021 •

edited

Loading

Why does DataLoader not throw an error when fed with a 1D vector for the target? #1599

Why does DataLoader not throw an error when fed with a 1D vector for the target? #1599

Comments

HTofi commented May 13, 2021 • edited Loading

DhairyaLGandhi commented May 13, 2021

darsnack commented May 13, 2021 • edited Loading

HTofi commented May 13, 2021 • edited Loading

ToucheSir commented May 13, 2021

darsnack commented May 13, 2021

CarloLucibello commented May 14, 2021 • edited Loading

Why does `DataLoader` not throw an error when fed with a 1D vector for the target? #1599

Why does `DataLoader` not throw an error when fed with a 1D vector for the target? #1599

HTofi commented May 13, 2021 •

edited

Loading

darsnack commented May 13, 2021 •

edited

Loading

HTofi commented May 13, 2021 •

edited

Loading

CarloLucibello commented May 14, 2021 •

edited

Loading