Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does DataLoader not throw an error when fed with a 1D vector for the target? #1599

Closed
HTofi opened this issue May 13, 2021 · 6 comments · Fixed by #1636
Closed

Why does DataLoader not throw an error when fed with a 1D vector for the target? #1599

HTofi opened this issue May 13, 2021 · 6 comments · Fixed by #1636

Comments

@HTofi
Copy link

HTofi commented May 13, 2021

Suppose you have the following simple example:

# a data set of two training examples [0 ,0] and [1, 1], with respective output targets of 0 and 1: 
train_input = [0 1; 0 1]
train_target = [0, 1]

We generate a machine learning model as follows:

# define the model
model = Chain(Dense(2, 1, σ))

# define parameters and cost function
parameters = params(model)
loss(x, y) = mse(model(x), y)

# set optimization algorithm
η = 0.1
optimizer = Descent(η)

loader = DataLoader((train_input, train_target), batchsize=2)

@epochs 1000 train!(loss, parameters, loader, optimizer)

The code runs without throwing any errors, though the model actually converges to a constant output value and fails to approximate the variation of the data.

If, however, one defines train_target as

train_target = [0 1]

which is a 1*2 matrix, the model behaves as expected, and we get a good result.

The documentation of DataLoader says that train_input and train_target have to be tensors, with the last dimension in each of them to be the observation dimension. Are 1D vectors not included in this definition of "tensors"? If so, why doesn't DataLoader throw an error?

@DhairyaLGandhi
Copy link
Member

One thing to see would be the behaviour with reshape(target, length(target), 1). I would want to avoid doing that silently, but also wonder what the correct error to throw would look like. Something that assumes a vector is just one sample doesn't sound great because a model may presumably have a real as an output.

@darsnack
Copy link
Member

darsnack commented May 13, 2021

In the original code, the label is 2D vector but the model returns a scalar. I think the "bug" here is with mse which should have thrown an error when you tried to compute the difference between a column vector and row vector. I don't think DataLoader should be making assumptions and altering the data to fit (as @DhairyaLGandhi points out, reshaping into a single vector is not always the right thing).

That being said, it is possible with Flux.outputsize for the DataLoader to do a quick shape check when beginning iteration.

@HTofi
Copy link
Author

HTofi commented May 13, 2021

reshape(target, length(target), 1) creates a 2*1 matrix, which causes DataLoader to return an error saying that number of observations in the two arrays input and target is not the same.

I think @darsnack is right about mse. It only returns a good result when given two 1*2 matrices:

julia> mse([0 1], [0 1])
0.0

though when the second arguement is a 1D vector, it computes something else (not sure what it is computing):

julia> mse([0 1], [0, 1])
0.5

Still I think this issue has to be taken care of by DataLoader, same way as it takes care of consistency in number of observations.

@ToucheSir
Copy link
Member

I think a good example is how the dataloader handles a smaller batch:

julia> loader = DataLoader((train_input, train_target), batchsize=1);

julia> collect(loader)
2-element Vector{Tuple{Matrix{Int64}, Matrix{Int64}}}:
 ([0; 0], [0])
 ([1; 1], [1])

So the consistency in the number of observations is preserved. I would personally be worried if the dataloader is adding extra dimensions to the arrays I give it, as a) reshape may not preserve the original structure and b) is not always appropriate depending on how the loss function is formulated. I do agree that MSE and other loss functions should be better about failing fast given inputs with mismatched dims.

@darsnack
Copy link
Member

Still I think this issue has to be taken care of by DataLoader, same way as it takes care of consistency in number of observations.

I think we can and should do both. There are two things to address here:

  1. Bugs with MSE (and potentially other losses) silently failing when the dims don't match
  2. DataLoader can add a feature with Flux.outputsize that checks whether the target dims (sans batch) and model output size match

(1) makes (2) redundant for now, but (2) let's us have friendlier errors and catch cases like (1) for future functions/losses.

@CarloLucibello
Copy link
Member

CarloLucibello commented May 14, 2021

pytorch restricts losses to have prediction and target of the same shape, which seems safer than what we do now accepting any broadcast compatible pair. I don't think we would lose much in functionality, most losses are trivial for users to reimplement by themselves if needed and we would avoid a lot of people to fall into this trap.
Should we add a check on the shapes for the next breaking release? Maybe just a warning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants