-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EMD loss function maybe wrong #2
Comments
And how would I compute the cdf inside the loss measure ? It's a tensor, not a numpy array. |
Turns out, there exists K.cumsum with which I can compute the CDF quite easily. Yeesh. Turns out this gives the correct answer for the loss : The following script has the output : import numpy as np
y_true = np.array([[0, 0, 0, 0, 0, 0, 0, 0.9, 0.1, 0]])
y_pred1 = np.array([[0, 0, 0, 0, 0, 0, 0.9, 0, 0.1, 0]])
y_pred2 = np.array([[0.9, 0, 0, 0, 0, 0, 0, 0, 0.1, 0]])
def emd_1(y_true, y_pred):
return np.sqrt(np.mean(np.square(np.abs(np.cumsum(y_true, axis=-1) - np.cumsum(y_pred, axis=-1)))))
def emd_2(y_true, y_pred):
return np.sqrt(np.mean(np.square(np.abs(y_true - y_pred))))
print("EMD 1")
print("Loss 1: ", emd_1(y_true, y_pred1))
print("Loss 2: ", emd_1(y_true, y_pred2))
print("EMD 2")
print("Loss 1: ", emd_2(y_true, y_pred1))
print("Loss 2: ", emd_2(y_true, y_pred2)) EMD 1
Loss 1: 0.284604989415
Loss 2: 0.752994023881
EMD 2
Loss 1: 0.40249223595
Loss 2: 0.40249223595 |
There is also def cumsum(tensor):
return tf.scan(lambda a, b: tf.add(a, b), tensor) |
Well, since K.cumsum already calls tf.cumsum in the backend, its good enough for loss calculation. |
It will take roughly 16 hours to train for 10 epochs again. Yeesh. At least my laptop is free for today anyway.. |
@titu1994 I noticed you are only training the top layer (whereas in the paper they train the inner layers with a 10x lower learning rate). I guess you are doing it for performance reasons. You know this trick where you just make a new network consisting only of the fully connected layer + dropout + softmax and just feed the predictions you got with the other layers as input? That's a LOT faster. |
@tfriedel Yes, I am training only the final dense layer since I don't have the computational memory requirements to train the full MobileNet model at image sizes of 224x224x3 with a batchsize of 200 on a 4GB laptop GPU. I know about that "trick" you mentioned. Under ordinary circumstances, I would think about applying that. However, this is a dataset of 255,000 images, taking roughly 13 GB of diskspace. On top of that, I am doing random horizontal flips on the train set. So make that 510,000 images x 7 x 7 spatial size x 1024 filters x 4 bytes ~= 510 000 * 7 * 7 * 1024 * (4 bytes) = 102.35904 gigabytes. Edit : If you take the output of the global average pooled features, you would require only 2.1 GB of diskspace. Hmm, perhaps this can be done afterall. I however won't have the time improving this codebase after I finish finetuning the current model. To compute a forward pass for that many images, it would take roughly 3.5 hours. Ofc, after that, training on the single FCN would be blazingly fast, if I was able to load that large a numpy array into my 16 GB RAM (which I can't). Now, if there were some way to chunk the numpy arrays into separate files, and load them via the TF dataset api, it would be more tractable. Edit: I forgot to mention that this isnt an ordinary classification problem where you can simply save the class number in a file and load that later and do a one hot encoding to get the final classification output. For each image, you need an array of size 10, normalized by its scores that need to be fed to the nn in order to get the correct output score and minimize the earth mover distance loss. To save and load such an aligned set of image features and output scores would require even more space and make the data loading even more unwieldy. Simply put, it would require significant engineering of the entire codebase to do it the "fast" way. The method you suggest is for toy datasets (which you can save and load feature arrays quickly), or for those who have dedicated supercomputers and enough time to engineer such a training framework. Given the significant challenges, the only "plus" side I can see is that in doing something like this, I could possibly train larger NIMA classifiers (as in using a NASNet, or an Inception-ResNet-v2 model as the base classifier). |
I think the 7 * 7 in your calculation is before avg pooling, but you would get the values out after it, so it I'm currently trying to finetune the whole network with code that's based on yours but does random cropping and finetuning of the whole network with different learning rates. Will keep you updated! |
@tfriedel make sure you are using the updated calculation of the loss measure that I posted a few hours back. The difference is slight, but maybe by finetuning the whole network you would get more of a difference. |
Yeah I've already incorporated the new loss, thanks! |
Ah got it. Seems I was looking in the wrong directory. tf.random_crop is what I needed, and I was searching for it in tf.image.* (semantic mistake I guess?). Anyway, I am just about done finetuning 5 epochs on the new loss, and it seems somewhat promising. I'm now gonna continue the next 15 epochs using random crops. Hopefully it yields even better results. |
I found the emd definition here https://github.com/titu1994/neural-image-assessment/blob/master/train_mobilenet.py#L49
You are missing the
CDF
function. According to the paper:EMD(y_true, y_pred) = sqrt(mean(abs(CDF(y_true) - CDF(y_pred))))
y_pred1's loss should be less than y_pred2's loss
The text was updated successfully, but these errors were encountered: