Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMD loss function maybe wrong #2

Closed
qzchenwl opened this issue Jan 4, 2018 · 11 comments
Closed

EMD loss function maybe wrong #2

qzchenwl opened this issue Jan 4, 2018 · 11 comments

Comments

@qzchenwl
Copy link

qzchenwl commented Jan 4, 2018

I found the emd definition here https://github.com/titu1994/neural-image-assessment/blob/master/train_mobilenet.py#L49

def earth_mover_loss(y_true, y_pred):
    return K.sqrt(K.mean(K.square(K.abs(y_true - y_pred))))

You are missing the CDF function. According to the paper: EMD(y_true, y_pred) = sqrt(mean(abs(CDF(y_true) - CDF(y_pred))))

y_true  = [0, 0, 0, 0, 0, 0, 0, 0.9, 0.1, 0]
y_pred1 = [0, 0, 0, 0, 0, 0, 0.9, 0, 0.1, 0]
y_pred2 = [0.9, 0, 0, 0, 0, 0, 0, 0, 0.1, 0]

y_pred1's loss should be less than y_pred2's loss

@titu1994
Copy link
Owner

titu1994 commented Jan 4, 2018

And how would I compute the cdf inside the loss measure ? It's a tensor, not a numpy array.

@titu1994
Copy link
Owner

titu1994 commented Jan 4, 2018

Turns out, there exists K.cumsum with which I can compute the CDF quite easily. Yeesh. Turns out this gives the correct answer for the loss :

The following script has the output :

import numpy as np

y_true = np.array([[0, 0, 0, 0, 0, 0, 0, 0.9, 0.1, 0]])
y_pred1 = np.array([[0, 0, 0, 0, 0, 0, 0.9, 0, 0.1, 0]])
y_pred2 = np.array([[0.9, 0, 0, 0, 0, 0, 0, 0, 0.1, 0]])

def emd_1(y_true, y_pred):
    return np.sqrt(np.mean(np.square(np.abs(np.cumsum(y_true, axis=-1) - np.cumsum(y_pred, axis=-1)))))

def emd_2(y_true, y_pred):
    return np.sqrt(np.mean(np.square(np.abs(y_true - y_pred))))

print("EMD 1")
print("Loss 1: ", emd_1(y_true, y_pred1))
print("Loss 2: ", emd_1(y_true, y_pred2))

print("EMD 2")
print("Loss 1: ", emd_2(y_true, y_pred1))
print("Loss 2: ", emd_2(y_true, y_pred2))
EMD 1
Loss 1:  0.284604989415
Loss 2:  0.752994023881

EMD 2
Loss 1:  0.40249223595
Loss 2:  0.40249223595

@qzchenwl
Copy link
Author

qzchenwl commented Jan 4, 2018

There is also scan function for tensorflow.

def cumsum(tensor):
    return tf.scan(lambda a, b: tf.add(a, b), tensor)

@titu1994
Copy link
Owner

titu1994 commented Jan 4, 2018

Well, since K.cumsum already calls tf.cumsum in the backend, its good enough for loss calculation.

@titu1994
Copy link
Owner

titu1994 commented Jan 4, 2018

It will take roughly 16 hours to train for 10 epochs again. Yeesh. At least my laptop is free for today anyway..

@titu1994 titu1994 closed this as completed Jan 4, 2018
This was referenced Jan 5, 2018
@tfriedel
Copy link

tfriedel commented Jan 5, 2018

@titu1994 I noticed you are only training the top layer (whereas in the paper they train the inner layers with a 10x lower learning rate). I guess you are doing it for performance reasons. You know this trick where you just make a new network consisting only of the fully connected layer + dropout + softmax and just feed the predictions you got with the other layers as input? That's a LOT faster.
See an example here:
https://github.com/fastai/courses/blob/master/deeplearning1/nbs/lesson3.ipynb

@titu1994
Copy link
Owner

titu1994 commented Jan 5, 2018

@tfriedel Yes, I am training only the final dense layer since I don't have the computational memory requirements to train the full MobileNet model at image sizes of 224x224x3 with a batchsize of 200 on a 4GB laptop GPU.

I know about that "trick" you mentioned. Under ordinary circumstances, I would think about applying that. However, this is a dataset of 255,000 images, taking roughly 13 GB of diskspace. On top of that, I am doing random horizontal flips on the train set. So make that 510,000 images x 7 x 7 spatial size x 1024 filters x 4 bytes ~= 510 000 * 7 * 7 * 1024 * (4 bytes) = 102.35904 gigabytes.

Edit : If you take the output of the global average pooled features, you would require only 2.1 GB of diskspace. Hmm, perhaps this can be done afterall. I however won't have the time improving this codebase after I finish finetuning the current model.

To compute a forward pass for that many images, it would take roughly 3.5 hours. Ofc, after that, training on the single FCN would be blazingly fast, if I was able to load that large a numpy array into my 16 GB RAM (which I can't). Now, if there were some way to chunk the numpy arrays into separate files, and load them via the TF dataset api, it would be more tractable.

Edit: I forgot to mention that this isnt an ordinary classification problem where you can simply save the class number in a file and load that later and do a one hot encoding to get the final classification output. For each image, you need an array of size 10, normalized by its scores that need to be fed to the nn in order to get the correct output score and minimize the earth mover distance loss. To save and load such an aligned set of image features and output scores would require even more space and make the data loading even more unwieldy.

Simply put, it would require significant engineering of the entire codebase to do it the "fast" way. The method you suggest is for toy datasets (which you can save and load feature arrays quickly), or for those who have dedicated supercomputers and enough time to engineer such a training framework.

Given the significant challenges, the only "plus" side I can see is that in doing something like this, I could possibly train larger NIMA classifiers (as in using a NASNet, or an Inception-ResNet-v2 model as the base classifier).

@tfriedel
Copy link

tfriedel commented Jan 5, 2018

I think the 7 * 7 in your calculation is before avg pooling, but you would get the values out after it, so it
only takes 4k per image really or about 2gb ram. So it would fit into ram.
But yeah it's a problem with the image augmentation. If you are not only doing flipping but also cropping..
The chunking of the numpy arrays can be done with bcolz, like here for example:
https://github.com/fastai/courses/blob/master/deeplearning2/imagenet_process.ipynb

I'm currently trying to finetune the whole network with code that's based on yours but does random cropping and finetuning of the whole network with different learning rates. Will keep you updated!

@titu1994
Copy link
Owner

titu1994 commented Jan 5, 2018

@tfriedel make sure you are using the updated calculation of the loss measure that I posted a few hours back. The difference is slight, but maybe by finetuning the whole network you would get more of a difference.

@tfriedel
Copy link

tfriedel commented Jan 5, 2018

Yeah I've already incorporated the new loss, thanks!
I'm not using the TF dataset api but I adapted code I've once written for a kaggle competition. It's based on ImageDataGenerator, which I modified to use a BcolzArrayIterator (so I don't have to have these huge numpy arrays in ram) and uses a function which does random cropping/flipping using the torchvision transforms API as a preprocessing step.
That said I looked into what TF has to offer in that regard and there are some functions like tf.random_crop, tf.image.crop_and_resize and so on.

@titu1994
Copy link
Owner

titu1994 commented Jan 5, 2018

Ah got it. Seems I was looking in the wrong directory. tf.random_crop is what I needed, and I was searching for it in tf.image.* (semantic mistake I guess?). Anyway, I am just about done finetuning 5 epochs on the new loss, and it seems somewhat promising.

I'm now gonna continue the next 15 epochs using random crops. Hopefully it yields even better results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants