EMD loss function maybe wrong #2

qzchenwl · 2018-01-04T05:59:54Z

I found the emd definition here https://github.com/titu1994/neural-image-assessment/blob/master/train_mobilenet.py#L49

def earth_mover_loss(y_true, y_pred):
    return K.sqrt(K.mean(K.square(K.abs(y_true - y_pred))))

You are missing the CDF function. According to the paper: EMD(y_true, y_pred) = sqrt(mean(abs(CDF(y_true) - CDF(y_pred))))

y_true  = [0, 0, 0, 0, 0, 0, 0, 0.9, 0.1, 0]
y_pred1 = [0, 0, 0, 0, 0, 0, 0.9, 0, 0.1, 0]
y_pred2 = [0.9, 0, 0, 0, 0, 0, 0, 0, 0.1, 0]

y_pred1's loss should be less than y_pred2's loss

The text was updated successfully, but these errors were encountered:

titu1994 · 2018-01-04T06:36:12Z

And how would I compute the cdf inside the loss measure ? It's a tensor, not a numpy array.

titu1994 · 2018-01-04T06:58:32Z

Turns out, there exists K.cumsum with which I can compute the CDF quite easily. Yeesh. Turns out this gives the correct answer for the loss :

The following script has the output :

import numpy as np

y_true = np.array([[0, 0, 0, 0, 0, 0, 0, 0.9, 0.1, 0]])
y_pred1 = np.array([[0, 0, 0, 0, 0, 0, 0.9, 0, 0.1, 0]])
y_pred2 = np.array([[0.9, 0, 0, 0, 0, 0, 0, 0, 0.1, 0]])

def emd_1(y_true, y_pred):
    return np.sqrt(np.mean(np.square(np.abs(np.cumsum(y_true, axis=-1) - np.cumsum(y_pred, axis=-1)))))

def emd_2(y_true, y_pred):
    return np.sqrt(np.mean(np.square(np.abs(y_true - y_pred))))

print("EMD 1")
print("Loss 1: ", emd_1(y_true, y_pred1))
print("Loss 2: ", emd_1(y_true, y_pred2))

print("EMD 2")
print("Loss 1: ", emd_2(y_true, y_pred1))
print("Loss 2: ", emd_2(y_true, y_pred2))

EMD 1
Loss 1:  0.284604989415
Loss 2:  0.752994023881

EMD 2
Loss 1:  0.40249223595
Loss 2:  0.40249223595

qzchenwl · 2018-01-04T07:19:51Z

There is also scan function for tensorflow.

def cumsum(tensor):
    return tf.scan(lambda a, b: tf.add(a, b), tensor)

titu1994 · 2018-01-04T07:20:59Z

Well, since K.cumsum already calls tf.cumsum in the backend, its good enough for loss calculation.

titu1994 · 2018-01-04T07:21:44Z

It will take roughly 16 hours to train for 10 epochs again. Yeesh. At least my laptop is free for today anyway..

tfriedel · 2018-01-05T14:42:30Z

@titu1994 I noticed you are only training the top layer (whereas in the paper they train the inner layers with a 10x lower learning rate). I guess you are doing it for performance reasons. You know this trick where you just make a new network consisting only of the fully connected layer + dropout + softmax and just feed the predictions you got with the other layers as input? That's a LOT faster.
See an example here:
https://github.com/fastai/courses/blob/master/deeplearning1/nbs/lesson3.ipynb

titu1994 · 2018-01-05T15:05:30Z

@tfriedel Yes, I am training only the final dense layer since I don't have the computational memory requirements to train the full MobileNet model at image sizes of 224x224x3 with a batchsize of 200 on a 4GB laptop GPU.

I know about that "trick" you mentioned. Under ordinary circumstances, I would think about applying that. However, this is a dataset of 255,000 images, taking roughly 13 GB of diskspace. On top of that, I am doing random horizontal flips on the train set. So make that 510,000 images x 7 x 7 spatial size x 1024 filters x 4 bytes ~= 510 000 * 7 * 7 * 1024 * (4 bytes) = 102.35904 gigabytes.

Edit : If you take the output of the global average pooled features, you would require only 2.1 GB of diskspace. Hmm, perhaps this can be done afterall. I however won't have the time improving this codebase after I finish finetuning the current model.

To compute a forward pass for that many images, it would take roughly 3.5 hours. Ofc, after that, training on the single FCN would be blazingly fast, if I was able to load that large a numpy array into my 16 GB RAM (which I can't). Now, if there were some way to chunk the numpy arrays into separate files, and load them via the TF dataset api, it would be more tractable.

Edit: I forgot to mention that this isnt an ordinary classification problem where you can simply save the class number in a file and load that later and do a one hot encoding to get the final classification output. For each image, you need an array of size 10, normalized by its scores that need to be fed to the nn in order to get the correct output score and minimize the earth mover distance loss. To save and load such an aligned set of image features and output scores would require even more space and make the data loading even more unwieldy.

Simply put, it would require significant engineering of the entire codebase to do it the "fast" way. The method you suggest is for toy datasets (which you can save and load feature arrays quickly), or for those who have dedicated supercomputers and enough time to engineer such a training framework.

Given the significant challenges, the only "plus" side I can see is that in doing something like this, I could possibly train larger NIMA classifiers (as in using a NASNet, or an Inception-ResNet-v2 model as the base classifier).

tfriedel · 2018-01-05T15:26:28Z

I think the 7 * 7 in your calculation is before avg pooling, but you would get the values out after it, so it
only takes 4k per image really or about 2gb ram. So it would fit into ram.
But yeah it's a problem with the image augmentation. If you are not only doing flipping but also cropping..
The chunking of the numpy arrays can be done with bcolz, like here for example:
https://github.com/fastai/courses/blob/master/deeplearning2/imagenet_process.ipynb

I'm currently trying to finetune the whole network with code that's based on yours but does random cropping and finetuning of the whole network with different learning rates. Will keep you updated!

titu1994 · 2018-01-05T15:30:49Z

@tfriedel make sure you are using the updated calculation of the loss measure that I posted a few hours back. The difference is slight, but maybe by finetuning the whole network you would get more of a difference.

tfriedel · 2018-01-05T15:43:13Z

Yeah I've already incorporated the new loss, thanks!
I'm not using the TF dataset api but I adapted code I've once written for a kaggle competition. It's based on ImageDataGenerator, which I modified to use a BcolzArrayIterator (so I don't have to have these huge numpy arrays in ram) and uses a function which does random cropping/flipping using the torchvision transforms API as a preprocessing step.
That said I looked into what TF has to offer in that regard and there are some functions like tf.random_crop, tf.image.crop_and_resize and so on.

titu1994 · 2018-01-05T15:45:34Z

Ah got it. Seems I was looking in the wrong directory. tf.random_crop is what I needed, and I was searching for it in tf.image.* (semantic mistake I guess?). Anyway, I am just about done finetuning 5 epochs on the new loss, and it seems somewhat promising.

I'm now gonna continue the next 15 epochs using random crops. Hopefully it yields even better results.

titu1994 closed this as completed Jan 4, 2018

This was referenced Jan 5, 2018

scores correct? #1

Closed

Wrong loss while computing mean. #3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EMD loss function maybe wrong #2

EMD loss function maybe wrong #2

qzchenwl commented Jan 4, 2018 •

edited

Loading

titu1994 commented Jan 4, 2018

titu1994 commented Jan 4, 2018

qzchenwl commented Jan 4, 2018

titu1994 commented Jan 4, 2018

titu1994 commented Jan 4, 2018

tfriedel commented Jan 5, 2018

titu1994 commented Jan 5, 2018 •

edited

Loading

tfriedel commented Jan 5, 2018

titu1994 commented Jan 5, 2018

tfriedel commented Jan 5, 2018

titu1994 commented Jan 5, 2018

EMD loss function maybe wrong #2

EMD loss function maybe wrong #2

Comments

qzchenwl commented Jan 4, 2018 • edited Loading

titu1994 commented Jan 4, 2018

titu1994 commented Jan 4, 2018

qzchenwl commented Jan 4, 2018

titu1994 commented Jan 4, 2018

titu1994 commented Jan 4, 2018

tfriedel commented Jan 5, 2018

titu1994 commented Jan 5, 2018 • edited Loading

tfriedel commented Jan 5, 2018

titu1994 commented Jan 5, 2018

tfriedel commented Jan 5, 2018

titu1994 commented Jan 5, 2018

qzchenwl commented Jan 4, 2018 •

edited

Loading

titu1994 commented Jan 5, 2018 •

edited

Loading