Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best Model info #42

Closed
teresalisanti opened this issue Sep 5, 2022 · 14 comments
Closed

Best Model info #42

teresalisanti opened this issue Sep 5, 2022 · 14 comments

Comments

@teresalisanti
Copy link

teresalisanti commented Sep 5, 2022

Hi Fuzail,
could you please share some information about the best model you got? I would like to know:

  • epoch_end
  • epoch in which you got the best model
  • metrics values on training and validation sets
  • hyperparameters values (learning rate, optimizer, regularizer (L2))
  • augmentation you performed (just horizontal flip?)
  • the normalization you applied on images and labels (is it "divide_by_255"?)

Thank you,

Teresa

@fuzailpalnak
Copy link
Owner

fuzailpalnak commented Sep 5, 2022

Information on which model are you interested in out of the two?

  1. RefineNet trained on INRIA
  2. DlinkNet trained on Massachusetts Buildings Dataset

@teresalisanti
Copy link
Author

Both if you can otherwise RefineNet is enough

@fuzailpalnak
Copy link
Owner

RefineNet

  1. Training

    • Training was carried out on 384x384 images for around 120 to 130 epochs(I can't remember the exact number)

    • For Augmentation, I used combination of color as well as geometric combination. As far as I can recall, I used color augmentation with very low probability i.e they were used not that often in the data augmentation stage, however, I used a lot of geometric augmentations, [random rotate, vertical flip, horizontal flip, crop, resize] One additional augmentation that I explicitly used was cropping the input image to size [224, 256, 288] and then rescaling it to 384x384

    • I applied minmax normalization on the images followed by standard Imagenet normalization.

    • Used Adam as optimizer with lr=1e-04 and kept the rest of the configurations same. I used L2 regularizer to tackle overfitting, however, I relied on data augmentation majority of the time to handle overfitting. For loss I used combination of Jaccard and Binary cross entropy, with alpha=0.3, where alpha is the weight for jaccard.

    • During training I used precision, recall and jaccard as meteric to monitor the progress.

  2. Prediction

    • For prediction I aggregated prediction form mutiple geometric augmentations.
  3. Metric

DlinkNet

  • This is just a baseline model, I did not do much here, just trained the model with deault configurations with early stopping.

@teresalisanti
Copy link
Author

Hi,
the model (RefineNet) you published on github, is the best one you trained with the hyperparameters above? I am a bit confused, it looks like it doesn't perform that well on our aerial images.

@fuzailpalnak
Copy link
Owner

Yes, its the best model. It could be because I also used test time augmentation while inference

@fuzailpalnak
Copy link
Owner

@teresalisanti are you running the refine-net model on a custom aerial imagery data ? or Inria data ? And the results are they from finetuned refine-net model ? or just the model weights that are shared in the repo ?

@teresalisanti
Copy link
Author

I trained from scratch the RefineNet model on custom aerial imagery data + Inria dataset, so i didn't use the model weights that you shared in the repository. I tested my best model configuration on my own images without test time augmentation. Why do you use test time augmentation while inference? I don't get the benefit.
Why don't you share the code with hyperparameters you copied above so that we can train our model with that configuration?

@fuzailpalnak
Copy link
Owner

fuzailpalnak commented Oct 24, 2022

unfortunately, I don't have the script that I used for training, it was in my old laptop which I no longer have and top of that, I did not commit the file to git :( .

TTA was helpful with IOU as in case of building, larger buildings are cut off in to different images, scaling during prediction helps with that problem, however, its not that significant though. Rotation helps to increase the confidence of prediction, its just to reduce pixels which don't have high confidence (false positives). To handle the boundary effect, Building's at the boundary of the image, might obtain low confidence, to tackle this Mirroring, Cropping and Scaling is done as to ensure buildings at the boundaries are detected

What I can suggest you do is, try fine tuning the model, if the image quality differ a lot then just use the weights for the earlier layers. If you plan to train the model from scratch than use the Imagenet weights for ResNet.

@fuzailpalnak fuzailpalnak reopened this Oct 24, 2022
@teresalisanti
Copy link
Author

teresalisanti commented Oct 25, 2022

Thank you!! Did You train your model from scratch or just the top layers of the network by applying transfer learning with Imagenet weights? How can i modify the scripts (refinenet.py, segmentation.py) to freeze all but the top layers which i would like to retrain using the weights you shared in this repo? Thank you :)

@fuzailpalnak
Copy link
Owner

fuzailpalnak commented Oct 26, 2022

I used ImageNet weights to initialise ResNet module and used default pytorch initialisation for rest, thats how I setup the training.

In the library its default to use ImageNet weights for training, i.e it does what I described above.

However, the library doesn’t have a functionality to split pre-trained weights which are not present in torchvision and make them non-trainable. You will have to write a custom function to set defined layers to non trainable after you have loaded the model.

What I would suggest you rather do is, load the entire weights file (file shared in the repo) and finetune it for your set of images. You can start by setting lr=1e-04 and see how the training progress and make changes accordingly.

You can set this param == False, that way just the weights in the decoder part will be updated.

@teresalisanti
Copy link
Author

Thank you! I am training the model using your weights for resnet. Why do i always get higher metrics on validation set?

image

This is really strange and it happens for every epoch

@fuzailpalnak
Copy link
Owner

fuzailpalnak commented Nov 25, 2022

One common reason for this is, when augmentation is applied the model gets some hard examples to learn from which causes the validation metric to be lower than training metric, eventually the model should be able to adapt to those hard examples and generate a more standard metric

if the issue still persists than you should either make the model more complex or reduce augmentation

@teresalisanti
Copy link
Author

if metrics on validation set are lower than those on training set, than everything is working fine. This should be the normal output of every training, as you said above. In our case metrics on validation set are always higher than those on training set.

@fuzailpalnak
Copy link
Owner

fuzailpalnak commented Nov 25, 2022

If the validation metric is lower than training metric for initial epochs then thats not a problem, however, if the train metric is always higher than validation throughout the training than it could be considered as a problem as this behaviour is not desired.

To avoid such behaviour, the model performance on training data without augmentation could be checked to verify that model is not under-fitting. If thats not the case then model complexity could be increased so the model is able to adapt the varying examples in the training set. One other sanity check would be to train on model with high complexity (perhaps Resent150 or greater) and observe the behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants